Identifying differentially expressed genes in microarray experiments with model-based variance estimation

Xiaodong Cai, Georgios B. Giannakis

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Statistical tests have been employed to identify genes differentially expressed under different conditions using data from microarray experiments. The variance of gene expression levels is often required in various statistical tests; however, due to the small number of replicates, the variance estimated from the sample variance is not accurate, which causes large false positive and negative errors. More accurate and robust variance estimation is thus highly desirable to improve the performance of statistical tests. In this paper, cluster analysis was performed on the microarray data using a model-based clustering method. The variance for each gene was then estimated from cluster variances. Since cluster variances are estimated from multiple genes whose microarray data have similar variance, the proposed estimation method pools the relevant genes together; this effectively increases the number of samples in variance estimation, thereby improving variance estimation. Using simulated data, it is shown that with the novel variance estimation, the performance of the t-test, regularized t-test, and a variant of SAM test, which is called the t-test here, can be improved. Using colon microarray data of Alon, it is demonstrated that the proposed method offers better or comparable performance compared with other gene pooling methods. Using the IHF microarray data of Arfin, it is shown that the proposed novel variance estimation decreases the significance of those genes having a small fold change but a high significant score assigned by the t-test using the sample variance, which potentially reduces false positive probability.

Original languageEnglish
Pages (from-to)2418-2426
Number of pages9
JournalIEEE Transactions on Signal Processing
Volume54
Issue number6 II
DOIs
StatePublished - Jun 1 2006

Fingerprint

Microarrays
Genes
Statistical tests
Experiments
Cluster analysis
Gene expression

Keywords

  • Clustering
  • Microarray
  • Mixture model
  • Statistical test
  • Variance estimation

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing

Cite this

Identifying differentially expressed genes in microarray experiments with model-based variance estimation. / Cai, Xiaodong; Giannakis, Georgios B.

In: IEEE Transactions on Signal Processing, Vol. 54, No. 6 II, 01.06.2006, p. 2418-2426.

Research output: Contribution to journalArticle

@article{ea045b830aae41429fad4155ce15638f,
title = "Identifying differentially expressed genes in microarray experiments with model-based variance estimation",
abstract = "Statistical tests have been employed to identify genes differentially expressed under different conditions using data from microarray experiments. The variance of gene expression levels is often required in various statistical tests; however, due to the small number of replicates, the variance estimated from the sample variance is not accurate, which causes large false positive and negative errors. More accurate and robust variance estimation is thus highly desirable to improve the performance of statistical tests. In this paper, cluster analysis was performed on the microarray data using a model-based clustering method. The variance for each gene was then estimated from cluster variances. Since cluster variances are estimated from multiple genes whose microarray data have similar variance, the proposed estimation method pools the relevant genes together; this effectively increases the number of samples in variance estimation, thereby improving variance estimation. Using simulated data, it is shown that with the novel variance estimation, the performance of the t-test, regularized t-test, and a variant of SAM test, which is called the t-test here, can be improved. Using colon microarray data of Alon, it is demonstrated that the proposed method offers better or comparable performance compared with other gene pooling methods. Using the IHF microarray data of Arfin, it is shown that the proposed novel variance estimation decreases the significance of those genes having a small fold change but a high significant score assigned by the t-test using the sample variance, which potentially reduces false positive probability.",
keywords = "Clustering, Microarray, Mixture model, Statistical test, Variance estimation",
author = "Xiaodong Cai and Giannakis, {Georgios B.}",
year = "2006",
month = "6",
day = "1",
doi = "10.1109/TSP.2006.873733",
language = "English",
volume = "54",
pages = "2418--2426",
journal = "IEEE Transactions on Signal Processing",
issn = "1053-587X",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "6 II",

}

TY - JOUR

T1 - Identifying differentially expressed genes in microarray experiments with model-based variance estimation

AU - Cai, Xiaodong

AU - Giannakis, Georgios B.

PY - 2006/6/1

Y1 - 2006/6/1

N2 - Statistical tests have been employed to identify genes differentially expressed under different conditions using data from microarray experiments. The variance of gene expression levels is often required in various statistical tests; however, due to the small number of replicates, the variance estimated from the sample variance is not accurate, which causes large false positive and negative errors. More accurate and robust variance estimation is thus highly desirable to improve the performance of statistical tests. In this paper, cluster analysis was performed on the microarray data using a model-based clustering method. The variance for each gene was then estimated from cluster variances. Since cluster variances are estimated from multiple genes whose microarray data have similar variance, the proposed estimation method pools the relevant genes together; this effectively increases the number of samples in variance estimation, thereby improving variance estimation. Using simulated data, it is shown that with the novel variance estimation, the performance of the t-test, regularized t-test, and a variant of SAM test, which is called the t-test here, can be improved. Using colon microarray data of Alon, it is demonstrated that the proposed method offers better or comparable performance compared with other gene pooling methods. Using the IHF microarray data of Arfin, it is shown that the proposed novel variance estimation decreases the significance of those genes having a small fold change but a high significant score assigned by the t-test using the sample variance, which potentially reduces false positive probability.

AB - Statistical tests have been employed to identify genes differentially expressed under different conditions using data from microarray experiments. The variance of gene expression levels is often required in various statistical tests; however, due to the small number of replicates, the variance estimated from the sample variance is not accurate, which causes large false positive and negative errors. More accurate and robust variance estimation is thus highly desirable to improve the performance of statistical tests. In this paper, cluster analysis was performed on the microarray data using a model-based clustering method. The variance for each gene was then estimated from cluster variances. Since cluster variances are estimated from multiple genes whose microarray data have similar variance, the proposed estimation method pools the relevant genes together; this effectively increases the number of samples in variance estimation, thereby improving variance estimation. Using simulated data, it is shown that with the novel variance estimation, the performance of the t-test, regularized t-test, and a variant of SAM test, which is called the t-test here, can be improved. Using colon microarray data of Alon, it is demonstrated that the proposed method offers better or comparable performance compared with other gene pooling methods. Using the IHF microarray data of Arfin, it is shown that the proposed novel variance estimation decreases the significance of those genes having a small fold change but a high significant score assigned by the t-test using the sample variance, which potentially reduces false positive probability.

KW - Clustering

KW - Microarray

KW - Mixture model

KW - Statistical test

KW - Variance estimation

UR - http://www.scopus.com/inward/record.url?scp=33744478402&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33744478402&partnerID=8YFLogxK

U2 - 10.1109/TSP.2006.873733

DO - 10.1109/TSP.2006.873733

M3 - Article

VL - 54

SP - 2418

EP - 2426

JO - IEEE Transactions on Signal Processing

JF - IEEE Transactions on Signal Processing

SN - 1053-587X

IS - 6 II

ER -