Impact of genotyping errors on statistical power of association tests in genomic analyses

A case study

Lin Hou, Ning Sun, Shrikant Mane, Fred Sayward, Nallakkandi Rajeevan, Kei Hoi Cheung, Kelly Cho, Saiju Pyarajan, Mihaela Aslan, Perry Miller, Philip D Harvey, J. Michael Gaziano, John Concato, Hongyu Zhao

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

A key step in genomic studies is to assess high throughput measurements across millions of markers for each participant's DNA, either using microarrays or sequencing techniques. Accurate genotype calling is essential for downstream statistical analysis of genotype-phenotype associations, and next generation sequencing (NGS) has recently become a more common approach in genomic studies. How the accuracy of variant calling in NGS-based studies affects downstream association analysis has not, however, been studied using empirical data in which both microarrays and NGS were available. In this article, we investigate the impact of variant calling errors on the statistical power to identify associations between single nucleotides and disease, and on associations between multiple rare variants and disease. Both differential and nondifferential genotyping errors are considered. Our results show that the power of burden tests for rare variants is strongly influenced by the specificity in variant calling, but is rather robust with regard to sensitivity. By using the variant calling accuracies estimated from a substudy of a Cooperative Studies Program project conducted by the Department of Veterans Affairs, we show that the power of association tests is mostly retained with commonly adopted variant calling pipelines. An R package, GWAS.PC, is provided to accommodate power analysis that takes account of genotyping errors (http://zhaocenter.org/software/).

Original languageEnglish (US)
Pages (from-to)152-162
Number of pages11
JournalGenetic Epidemiology
Volume41
Issue number2
DOIs
StatePublished - Feb 1 2017

Fingerprint

Genome-Wide Association Study
Genetic Association Studies
Veterans
Rare Diseases
Software
Nucleotides
Genotype
DNA

Keywords

  • genome wide association test
  • genotyping
  • genotyping error
  • sequencing
  • statistical power

ASJC Scopus subject areas

  • Epidemiology
  • Genetics(clinical)

Cite this

Hou, L., Sun, N., Mane, S., Sayward, F., Rajeevan, N., Cheung, K. H., ... Zhao, H. (2017). Impact of genotyping errors on statistical power of association tests in genomic analyses: A case study. Genetic Epidemiology, 41(2), 152-162. https://doi.org/10.1002/gepi.22027

Impact of genotyping errors on statistical power of association tests in genomic analyses : A case study. / Hou, Lin; Sun, Ning; Mane, Shrikant; Sayward, Fred; Rajeevan, Nallakkandi; Cheung, Kei Hoi; Cho, Kelly; Pyarajan, Saiju; Aslan, Mihaela; Miller, Perry; Harvey, Philip D; Gaziano, J. Michael; Concato, John; Zhao, Hongyu.

In: Genetic Epidemiology, Vol. 41, No. 2, 01.02.2017, p. 152-162.

Research output: Contribution to journalArticle

Hou, L, Sun, N, Mane, S, Sayward, F, Rajeevan, N, Cheung, KH, Cho, K, Pyarajan, S, Aslan, M, Miller, P, Harvey, PD, Gaziano, JM, Concato, J & Zhao, H 2017, 'Impact of genotyping errors on statistical power of association tests in genomic analyses: A case study', Genetic Epidemiology, vol. 41, no. 2, pp. 152-162. https://doi.org/10.1002/gepi.22027
Hou, Lin ; Sun, Ning ; Mane, Shrikant ; Sayward, Fred ; Rajeevan, Nallakkandi ; Cheung, Kei Hoi ; Cho, Kelly ; Pyarajan, Saiju ; Aslan, Mihaela ; Miller, Perry ; Harvey, Philip D ; Gaziano, J. Michael ; Concato, John ; Zhao, Hongyu. / Impact of genotyping errors on statistical power of association tests in genomic analyses : A case study. In: Genetic Epidemiology. 2017 ; Vol. 41, No. 2. pp. 152-162.
@article{478aa9bb444f4377966d6c948e630a04,
title = "Impact of genotyping errors on statistical power of association tests in genomic analyses: A case study",
abstract = "A key step in genomic studies is to assess high throughput measurements across millions of markers for each participant's DNA, either using microarrays or sequencing techniques. Accurate genotype calling is essential for downstream statistical analysis of genotype-phenotype associations, and next generation sequencing (NGS) has recently become a more common approach in genomic studies. How the accuracy of variant calling in NGS-based studies affects downstream association analysis has not, however, been studied using empirical data in which both microarrays and NGS were available. In this article, we investigate the impact of variant calling errors on the statistical power to identify associations between single nucleotides and disease, and on associations between multiple rare variants and disease. Both differential and nondifferential genotyping errors are considered. Our results show that the power of burden tests for rare variants is strongly influenced by the specificity in variant calling, but is rather robust with regard to sensitivity. By using the variant calling accuracies estimated from a substudy of a Cooperative Studies Program project conducted by the Department of Veterans Affairs, we show that the power of association tests is mostly retained with commonly adopted variant calling pipelines. An R package, GWAS.PC, is provided to accommodate power analysis that takes account of genotyping errors (http://zhaocenter.org/software/).",
keywords = "genome wide association test, genotyping, genotyping error, sequencing, statistical power",
author = "Lin Hou and Ning Sun and Shrikant Mane and Fred Sayward and Nallakkandi Rajeevan and Cheung, {Kei Hoi} and Kelly Cho and Saiju Pyarajan and Mihaela Aslan and Perry Miller and Harvey, {Philip D} and Gaziano, {J. Michael} and John Concato and Hongyu Zhao",
year = "2017",
month = "2",
day = "1",
doi = "10.1002/gepi.22027",
language = "English (US)",
volume = "41",
pages = "152--162",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "2",

}

TY - JOUR

T1 - Impact of genotyping errors on statistical power of association tests in genomic analyses

T2 - A case study

AU - Hou, Lin

AU - Sun, Ning

AU - Mane, Shrikant

AU - Sayward, Fred

AU - Rajeevan, Nallakkandi

AU - Cheung, Kei Hoi

AU - Cho, Kelly

AU - Pyarajan, Saiju

AU - Aslan, Mihaela

AU - Miller, Perry

AU - Harvey, Philip D

AU - Gaziano, J. Michael

AU - Concato, John

AU - Zhao, Hongyu

PY - 2017/2/1

Y1 - 2017/2/1

N2 - A key step in genomic studies is to assess high throughput measurements across millions of markers for each participant's DNA, either using microarrays or sequencing techniques. Accurate genotype calling is essential for downstream statistical analysis of genotype-phenotype associations, and next generation sequencing (NGS) has recently become a more common approach in genomic studies. How the accuracy of variant calling in NGS-based studies affects downstream association analysis has not, however, been studied using empirical data in which both microarrays and NGS were available. In this article, we investigate the impact of variant calling errors on the statistical power to identify associations between single nucleotides and disease, and on associations between multiple rare variants and disease. Both differential and nondifferential genotyping errors are considered. Our results show that the power of burden tests for rare variants is strongly influenced by the specificity in variant calling, but is rather robust with regard to sensitivity. By using the variant calling accuracies estimated from a substudy of a Cooperative Studies Program project conducted by the Department of Veterans Affairs, we show that the power of association tests is mostly retained with commonly adopted variant calling pipelines. An R package, GWAS.PC, is provided to accommodate power analysis that takes account of genotyping errors (http://zhaocenter.org/software/).

AB - A key step in genomic studies is to assess high throughput measurements across millions of markers for each participant's DNA, either using microarrays or sequencing techniques. Accurate genotype calling is essential for downstream statistical analysis of genotype-phenotype associations, and next generation sequencing (NGS) has recently become a more common approach in genomic studies. How the accuracy of variant calling in NGS-based studies affects downstream association analysis has not, however, been studied using empirical data in which both microarrays and NGS were available. In this article, we investigate the impact of variant calling errors on the statistical power to identify associations between single nucleotides and disease, and on associations between multiple rare variants and disease. Both differential and nondifferential genotyping errors are considered. Our results show that the power of burden tests for rare variants is strongly influenced by the specificity in variant calling, but is rather robust with regard to sensitivity. By using the variant calling accuracies estimated from a substudy of a Cooperative Studies Program project conducted by the Department of Veterans Affairs, we show that the power of association tests is mostly retained with commonly adopted variant calling pipelines. An R package, GWAS.PC, is provided to accommodate power analysis that takes account of genotyping errors (http://zhaocenter.org/software/).

KW - genome wide association test

KW - genotyping

KW - genotyping error

KW - sequencing

KW - statistical power

UR - http://www.scopus.com/inward/record.url?scp=85008932269&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85008932269&partnerID=8YFLogxK

U2 - 10.1002/gepi.22027

DO - 10.1002/gepi.22027

M3 - Article

VL - 41

SP - 152

EP - 162

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 2

ER -