Adaptive elastic-net sparse principal component analysis for pathway association testing

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Pathway or gene set analysis has become an increasingly popular approach for analyzing high-throughput biological experiments such as microarray gene expression studies. The purpose of pathway analysis is to identify differentially expressed pathways associated with outcomes. Important challenges in pathway analysis are selecting a subset of genes contributing most to association with clinical phenotypes and conducting statistical tests of association for the pathways efficiently. We propose a two-stage analysis strategy: (1) extract latent variables representing activities within each pathway using a dimension reduction approach based on adaptive elastic-net sparse principal component analysis; (2) integrate the latent variables with the regression modeling framework to analyze studies with different types of outcomes such as binary, continuous or survival outcomes. Our proposed approach is computationally efficient. For each pathway, because the latent variables are estimated in an unsupervised fashion without using disease outcome information, in the sample label permutation testing procedure, the latent variables only need to be calculated once rather than for each permutation resample. Using both simulated and real datasets, we show our approach performed favorably when compared with five other currently available pathway testing methods.

Original languageEnglish (US)
Article number48
JournalStatistical Applications in Genetics and Molecular Biology
Volume10
Issue number1
DOIs
StatePublished - Nov 9 2011
Externally publishedYes

Fingerprint

Elastic Net
Principal Component Analysis
Principal component analysis
Pathway
Genes
Association reactions
Testing
Statistical tests
Latent Variables
Microarrays
Gene expression
Labels
Throughput
Phenotype
Gene Expression
Permutation
Experiments
Gene
Dimension Reduction
Statistical test

Keywords

  • gene expression
  • microarray
  • pathway analysis
  • sparse principal component analysis

ASJC Scopus subject areas

  • Statistics and Probability
  • Molecular Biology
  • Genetics
  • Computational Mathematics

Cite this

@article{6e9cd0741c5f4238a08f566adb0c57e2,
title = "Adaptive elastic-net sparse principal component analysis for pathway association testing",
abstract = "Pathway or gene set analysis has become an increasingly popular approach for analyzing high-throughput biological experiments such as microarray gene expression studies. The purpose of pathway analysis is to identify differentially expressed pathways associated with outcomes. Important challenges in pathway analysis are selecting a subset of genes contributing most to association with clinical phenotypes and conducting statistical tests of association for the pathways efficiently. We propose a two-stage analysis strategy: (1) extract latent variables representing activities within each pathway using a dimension reduction approach based on adaptive elastic-net sparse principal component analysis; (2) integrate the latent variables with the regression modeling framework to analyze studies with different types of outcomes such as binary, continuous or survival outcomes. Our proposed approach is computationally efficient. For each pathway, because the latent variables are estimated in an unsupervised fashion without using disease outcome information, in the sample label permutation testing procedure, the latent variables only need to be calculated once rather than for each permutation resample. Using both simulated and real datasets, we show our approach performed favorably when compared with five other currently available pathway testing methods.",
keywords = "gene expression, microarray, pathway analysis, sparse principal component analysis",
author = "Xi Chen",
year = "2011",
month = "11",
day = "9",
doi = "10.2202/1544-6115.1697",
language = "English (US)",
volume = "10",
journal = "Statistical Applications in Genetics and Molecular Biology",
issn = "1544-6115",
publisher = "Berkeley Electronic Press",
number = "1",

}

TY - JOUR

T1 - Adaptive elastic-net sparse principal component analysis for pathway association testing

AU - Chen, Xi

PY - 2011/11/9

Y1 - 2011/11/9

N2 - Pathway or gene set analysis has become an increasingly popular approach for analyzing high-throughput biological experiments such as microarray gene expression studies. The purpose of pathway analysis is to identify differentially expressed pathways associated with outcomes. Important challenges in pathway analysis are selecting a subset of genes contributing most to association with clinical phenotypes and conducting statistical tests of association for the pathways efficiently. We propose a two-stage analysis strategy: (1) extract latent variables representing activities within each pathway using a dimension reduction approach based on adaptive elastic-net sparse principal component analysis; (2) integrate the latent variables with the regression modeling framework to analyze studies with different types of outcomes such as binary, continuous or survival outcomes. Our proposed approach is computationally efficient. For each pathway, because the latent variables are estimated in an unsupervised fashion without using disease outcome information, in the sample label permutation testing procedure, the latent variables only need to be calculated once rather than for each permutation resample. Using both simulated and real datasets, we show our approach performed favorably when compared with five other currently available pathway testing methods.

AB - Pathway or gene set analysis has become an increasingly popular approach for analyzing high-throughput biological experiments such as microarray gene expression studies. The purpose of pathway analysis is to identify differentially expressed pathways associated with outcomes. Important challenges in pathway analysis are selecting a subset of genes contributing most to association with clinical phenotypes and conducting statistical tests of association for the pathways efficiently. We propose a two-stage analysis strategy: (1) extract latent variables representing activities within each pathway using a dimension reduction approach based on adaptive elastic-net sparse principal component analysis; (2) integrate the latent variables with the regression modeling framework to analyze studies with different types of outcomes such as binary, continuous or survival outcomes. Our proposed approach is computationally efficient. For each pathway, because the latent variables are estimated in an unsupervised fashion without using disease outcome information, in the sample label permutation testing procedure, the latent variables only need to be calculated once rather than for each permutation resample. Using both simulated and real datasets, we show our approach performed favorably when compared with five other currently available pathway testing methods.

KW - gene expression

KW - microarray

KW - pathway analysis

KW - sparse principal component analysis

UR - http://www.scopus.com/inward/record.url?scp=80455132482&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80455132482&partnerID=8YFLogxK

U2 - 10.2202/1544-6115.1697

DO - 10.2202/1544-6115.1697

M3 - Article

C2 - 23089825

AN - SCOPUS:80455132482

VL - 10

JO - Statistical Applications in Genetics and Molecular Biology

JF - Statistical Applications in Genetics and Molecular Biology

SN - 1544-6115

IS - 1

M1 - 48

ER -