CART variance stabilization and regularization for high-throughput genomic data

Ariadni Papana, Hemant Ishwaran

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Motivation: mRNA expression data obtained from high-throughput DNA microarrays exhibit strong departures from homogeneity of variances. Often a complex relationship between mean expression value and variance is seen. Variance stabilization of such data is crucial for many types of statistical analyses, while regularization of variances (pooling of information) can greatly improve overall accuracy of test statistics. Results: A Classification and Regression Tree (CART) produce is introduced for variance stabilization as well as regularization. The CART procedure adaptively clusters genes by variances. Using both local and cluster wide information leads to improved estimation of population variances which improves test statistics. Whereas making use of cluster wide information allows for variance stabilization of data.

Original languageEnglish
Pages (from-to)2254-2261
Number of pages8
JournalBioinformatics
Volume22
Issue number18
DOIs
StatePublished - Sep 15 2006
Externally publishedYes

Fingerprint

Classification and Regression Trees
High Throughput
Genomics
Regularization
Stabilization
Throughput
Multigene Family
Oligonucleotide Array Sequence Analysis
Statistics
Microarrays
Messenger RNA
DNA
Test Statistic
Genes
Population
Homogeneity of Variances
DNA Microarray
Pooling
Gene

ASJC Scopus subject areas

  • Clinical Biochemistry
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this

CART variance stabilization and regularization for high-throughput genomic data. / Papana, Ariadni; Ishwaran, Hemant.

In: Bioinformatics, Vol. 22, No. 18, 15.09.2006, p. 2254-2261.

Research output: Contribution to journalArticle

@article{35870f14615040d6800f7ea46cd6d399,
title = "CART variance stabilization and regularization for high-throughput genomic data",
abstract = "Motivation: mRNA expression data obtained from high-throughput DNA microarrays exhibit strong departures from homogeneity of variances. Often a complex relationship between mean expression value and variance is seen. Variance stabilization of such data is crucial for many types of statistical analyses, while regularization of variances (pooling of information) can greatly improve overall accuracy of test statistics. Results: A Classification and Regression Tree (CART) produce is introduced for variance stabilization as well as regularization. The CART procedure adaptively clusters genes by variances. Using both local and cluster wide information leads to improved estimation of population variances which improves test statistics. Whereas making use of cluster wide information allows for variance stabilization of data.",
author = "Ariadni Papana and Hemant Ishwaran",
year = "2006",
month = "9",
day = "15",
doi = "10.1093/bioinformatics/btl384",
language = "English",
volume = "22",
pages = "2254--2261",
journal = "Bioinformatics (Oxford, England)",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "18",

}

TY - JOUR

T1 - CART variance stabilization and regularization for high-throughput genomic data

AU - Papana, Ariadni

AU - Ishwaran, Hemant

PY - 2006/9/15

Y1 - 2006/9/15

N2 - Motivation: mRNA expression data obtained from high-throughput DNA microarrays exhibit strong departures from homogeneity of variances. Often a complex relationship between mean expression value and variance is seen. Variance stabilization of such data is crucial for many types of statistical analyses, while regularization of variances (pooling of information) can greatly improve overall accuracy of test statistics. Results: A Classification and Regression Tree (CART) produce is introduced for variance stabilization as well as regularization. The CART procedure adaptively clusters genes by variances. Using both local and cluster wide information leads to improved estimation of population variances which improves test statistics. Whereas making use of cluster wide information allows for variance stabilization of data.

AB - Motivation: mRNA expression data obtained from high-throughput DNA microarrays exhibit strong departures from homogeneity of variances. Often a complex relationship between mean expression value and variance is seen. Variance stabilization of such data is crucial for many types of statistical analyses, while regularization of variances (pooling of information) can greatly improve overall accuracy of test statistics. Results: A Classification and Regression Tree (CART) produce is introduced for variance stabilization as well as regularization. The CART procedure adaptively clusters genes by variances. Using both local and cluster wide information leads to improved estimation of population variances which improves test statistics. Whereas making use of cluster wide information allows for variance stabilization of data.

UR - http://www.scopus.com/inward/record.url?scp=33748706159&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33748706159&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btl384

DO - 10.1093/bioinformatics/btl384

M3 - Article

VL - 22

SP - 2254

EP - 2261

JO - Bioinformatics (Oxford, England)

JF - Bioinformatics (Oxford, England)

SN - 1367-4803

IS - 18

ER -