Pathway-based analysis for genome-wide association studies using supervised principal components

Xi Chen, Lily Wang, Bo Hu, Mingsheng Guo, John Barnard, Xiaofeng Zhu

Research output: Contribution to journalArticle

36 Citations (Scopus)

Abstract

Many complex diseases are influenced by genetic variations in multiple genes, each with only a small marginal effect on disease susceptibility. Pathway analysis, which identifies biological pathways associated with disease outcome, has become increasingly popular for genome-wide association studies (GWAS). In addition to combining weak signals from a number of SNPs in the same pathway, results from pathway analysis also shed light on the biological processes underlying disease. We propose a new pathway-based analysis method for GWAS, the supervised principal component analysis (SPCA) model. In the proposed SPCA model, a selected subset of SNPs most associated with disease outcome is used to estimate the latent variable for a pathway. The estimated latent variable for each pathway is an optimal linear combination of a selected subset of SNPs; therefore, the proposed SPCA model provides the ability to borrow strength across the SNPs in a pathway. In addition to identifying pathways associated with disease outcome, SPCA also carries out additional within-category selection to identify the most important SNPs within each gene set. The proposed model operates in a well-established statistical framework and can handle design information such as covariate adjustment and matching information in GWAS. We compare the proposed method with currently available methods using data with realistic linkage disequilibrium structures, and we illustrate the SPCA method using the Wellcome Trust Case-Control Consortium Crohn Disease (CD) data set.

Original languageEnglish (US)
Pages (from-to)716-724
Number of pages9
JournalGenetic Epidemiology
Volume34
Issue number7
DOIs
StatePublished - Nov 1 2010
Externally publishedYes

Fingerprint

Genome-Wide Association Study
Principal Component Analysis
Single Nucleotide Polymorphism
Biological Phenomena
Disease Susceptibility
Linkage Disequilibrium
Crohn Disease
Genes

Keywords

  • Genome-wide association
  • Pathway analysis
  • Principal component analysis
  • SNPs

ASJC Scopus subject areas

  • Epidemiology
  • Genetics(clinical)

Cite this

Pathway-based analysis for genome-wide association studies using supervised principal components. / Chen, Xi; Wang, Lily; Hu, Bo; Guo, Mingsheng; Barnard, John; Zhu, Xiaofeng.

In: Genetic Epidemiology, Vol. 34, No. 7, 01.11.2010, p. 716-724.

Research output: Contribution to journalArticle

Chen, Xi ; Wang, Lily ; Hu, Bo ; Guo, Mingsheng ; Barnard, John ; Zhu, Xiaofeng. / Pathway-based analysis for genome-wide association studies using supervised principal components. In: Genetic Epidemiology. 2010 ; Vol. 34, No. 7. pp. 716-724.
@article{8d2a19476a2744e7b315322c34b7be33,
title = "Pathway-based analysis for genome-wide association studies using supervised principal components",
abstract = "Many complex diseases are influenced by genetic variations in multiple genes, each with only a small marginal effect on disease susceptibility. Pathway analysis, which identifies biological pathways associated with disease outcome, has become increasingly popular for genome-wide association studies (GWAS). In addition to combining weak signals from a number of SNPs in the same pathway, results from pathway analysis also shed light on the biological processes underlying disease. We propose a new pathway-based analysis method for GWAS, the supervised principal component analysis (SPCA) model. In the proposed SPCA model, a selected subset of SNPs most associated with disease outcome is used to estimate the latent variable for a pathway. The estimated latent variable for each pathway is an optimal linear combination of a selected subset of SNPs; therefore, the proposed SPCA model provides the ability to borrow strength across the SNPs in a pathway. In addition to identifying pathways associated with disease outcome, SPCA also carries out additional within-category selection to identify the most important SNPs within each gene set. The proposed model operates in a well-established statistical framework and can handle design information such as covariate adjustment and matching information in GWAS. We compare the proposed method with currently available methods using data with realistic linkage disequilibrium structures, and we illustrate the SPCA method using the Wellcome Trust Case-Control Consortium Crohn Disease (CD) data set.",
keywords = "Genome-wide association, Pathway analysis, Principal component analysis, SNPs",
author = "Xi Chen and Lily Wang and Bo Hu and Mingsheng Guo and John Barnard and Xiaofeng Zhu",
year = "2010",
month = "11",
day = "1",
doi = "10.1002/gepi.20532",
language = "English (US)",
volume = "34",
pages = "716--724",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "7",

}

TY - JOUR

T1 - Pathway-based analysis for genome-wide association studies using supervised principal components

AU - Chen, Xi

AU - Wang, Lily

AU - Hu, Bo

AU - Guo, Mingsheng

AU - Barnard, John

AU - Zhu, Xiaofeng

PY - 2010/11/1

Y1 - 2010/11/1

N2 - Many complex diseases are influenced by genetic variations in multiple genes, each with only a small marginal effect on disease susceptibility. Pathway analysis, which identifies biological pathways associated with disease outcome, has become increasingly popular for genome-wide association studies (GWAS). In addition to combining weak signals from a number of SNPs in the same pathway, results from pathway analysis also shed light on the biological processes underlying disease. We propose a new pathway-based analysis method for GWAS, the supervised principal component analysis (SPCA) model. In the proposed SPCA model, a selected subset of SNPs most associated with disease outcome is used to estimate the latent variable for a pathway. The estimated latent variable for each pathway is an optimal linear combination of a selected subset of SNPs; therefore, the proposed SPCA model provides the ability to borrow strength across the SNPs in a pathway. In addition to identifying pathways associated with disease outcome, SPCA also carries out additional within-category selection to identify the most important SNPs within each gene set. The proposed model operates in a well-established statistical framework and can handle design information such as covariate adjustment and matching information in GWAS. We compare the proposed method with currently available methods using data with realistic linkage disequilibrium structures, and we illustrate the SPCA method using the Wellcome Trust Case-Control Consortium Crohn Disease (CD) data set.

AB - Many complex diseases are influenced by genetic variations in multiple genes, each with only a small marginal effect on disease susceptibility. Pathway analysis, which identifies biological pathways associated with disease outcome, has become increasingly popular for genome-wide association studies (GWAS). In addition to combining weak signals from a number of SNPs in the same pathway, results from pathway analysis also shed light on the biological processes underlying disease. We propose a new pathway-based analysis method for GWAS, the supervised principal component analysis (SPCA) model. In the proposed SPCA model, a selected subset of SNPs most associated with disease outcome is used to estimate the latent variable for a pathway. The estimated latent variable for each pathway is an optimal linear combination of a selected subset of SNPs; therefore, the proposed SPCA model provides the ability to borrow strength across the SNPs in a pathway. In addition to identifying pathways associated with disease outcome, SPCA also carries out additional within-category selection to identify the most important SNPs within each gene set. The proposed model operates in a well-established statistical framework and can handle design information such as covariate adjustment and matching information in GWAS. We compare the proposed method with currently available methods using data with realistic linkage disequilibrium structures, and we illustrate the SPCA method using the Wellcome Trust Case-Control Consortium Crohn Disease (CD) data set.

KW - Genome-wide association

KW - Pathway analysis

KW - Principal component analysis

KW - SNPs

UR - http://www.scopus.com/inward/record.url?scp=77958611364&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77958611364&partnerID=8YFLogxK

U2 - 10.1002/gepi.20532

DO - 10.1002/gepi.20532

M3 - Article

VL - 34

SP - 716

EP - 724

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 7

ER -