Methodological considerations in estimation of phenotype heritability using genome-wide SNP data, illustrated by an analysis of the heritability of height in a large sample of African ancestry adults

Fang Chen, Jing He, Jianqi Zhang, Gary K. Chen, Venetta Thomas, Christine B. Ambrosone, Elisa V. Bandera, Sonja I. Berndt, Leslie Bernstein, William J. Blot, Qiuyin Cai, John Carpten, Graham Casey, Stephen J. Chanock, Iona Cheng, Lisa Chu, Sandra L. Deming, W. Ryan Driver, Phyllis Goodman, Richard B. HayesAnselm J M Hennis, Ann W. Hsing, Jennifer Hu, Sue A. Ingles, Esther M. John, Rick A. Kittles, Suzanne Kolb, M. Cristina Leske, Robert C. Millikan, Kristine R. Monroe, Adam Murphy, Barbara Nemesure, Christine Neslund-Dudas, Sarah Nyante, Elaine A. Ostrander, Michael F. Press, Jorge L. Rodriguez-Gil, Ben A. Rybicki, Fredrick Schumacher, Janet L. Stanford, Lisa B. Signorello, Sara S. Strom, Victoria Stevens, David Van Den Berg, Zhaoming Wang, John S. Witte, Suh Yuh Wu, Yuko Yamamura, Wei Zheng, Regina G. Ziegler, Alexander H. Stram, Laurence N. Kolonel, Loïc Le Marchand, Brian E. Henderson, Christopher A. Haiman, Daniel O. Stram

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Height has an extremely polygenic pattern of inheritance. Genome-wide association studies (GWAS) have revealed hundreds of common variants that are associated with human height at genome-wide levels of significance. However, only a small fraction of phenotypic variation can be explained by the aggregate of these common variants. In a large study of African-American men and women (n = 14,419), we genotyped and analyzed 966,578 autosomal SNPs across the entire genome using a linear mixed model variance components approach implemented in the program GCTA (Yang et al Nat Genet 2010), and estimated an additive heritability of 44.7% (se: 3.7%) for this phenotype in a sample of evidently unrelated individuals. While this estimated value is similar to that given by Yang et al in their analyses, we remain concerned about two related issues: (1) whether in the complete absence of hidden relatedness, variance components methods have adequate power to estimate heritability when a very large number of SNPs are used in the analysis; and (2) whether estimation of heritability may be biased, in real studies, by low levels of residual hidden relatedness. We addressed the first question in a semi-analytic fashion by directly simulating the distribution of the score statistic for a test of zero heritability with and without low levels of relatedness. The second question was addressed by a very careful comparison of the behavior of estimated heritability for both observed (self-reported) height and simulated phenotypes compared to imputation R2 as a function of the number of SNPs used in the analysis. These simulations help to address the important question about whether today's GWAS SNPs will remain useful for imputing causal variants that are discovered using very large sample sizes in future studies of height, or whether the causal variants themselves will need to be genotyped de novo in order to build a prediction model that ultimately captures a large fraction of the variability of height, and by implication other complex phenotypes. Our overall conclusions are that when study sizes are quite large (5,000 or so) the additive heritability estimate for height is not apparently biased upwards using the linear mixed model; however there is evidence in our simulation that a very large number of causal variants (many thousands) each with very small effect on phenotypic variance will need to be discovered to fill the gap between the heritability explained by known versus unknown causal variants. We conclude that today's GWAS data will remain useful in the future for causal variant prediction, but that finding the causal variants that need to be predicted may be extremely laborious.

Original languageEnglish (US)
Article numbere0131106
JournalPLoS One
Volume10
Issue number6
DOIs
StatePublished - Jun 30 2015

Fingerprint

Single Nucleotide Polymorphism
Genome-Wide Association Study
ancestry
heritability
Genes
Genome
Phenotype
phenotype
genome
Linear Models
Multifactorial Inheritance
Viverridae
sampling
phenotypic variation
African Americans
Sample Size
prediction
Statistics
inheritance (genetics)
statistics

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

Methodological considerations in estimation of phenotype heritability using genome-wide SNP data, illustrated by an analysis of the heritability of height in a large sample of African ancestry adults. / Chen, Fang; He, Jing; Zhang, Jianqi; Chen, Gary K.; Thomas, Venetta; Ambrosone, Christine B.; Bandera, Elisa V.; Berndt, Sonja I.; Bernstein, Leslie; Blot, William J.; Cai, Qiuyin; Carpten, John; Casey, Graham; Chanock, Stephen J.; Cheng, Iona; Chu, Lisa; Deming, Sandra L.; Driver, W. Ryan; Goodman, Phyllis; Hayes, Richard B.; Hennis, Anselm J M; Hsing, Ann W.; Hu, Jennifer; Ingles, Sue A.; John, Esther M.; Kittles, Rick A.; Kolb, Suzanne; Leske, M. Cristina; Millikan, Robert C.; Monroe, Kristine R.; Murphy, Adam; Nemesure, Barbara; Neslund-Dudas, Christine; Nyante, Sarah; Ostrander, Elaine A.; Press, Michael F.; Rodriguez-Gil, Jorge L.; Rybicki, Ben A.; Schumacher, Fredrick; Stanford, Janet L.; Signorello, Lisa B.; Strom, Sara S.; Stevens, Victoria; Van Den Berg, David; Wang, Zhaoming; Witte, John S.; Wu, Suh Yuh; Yamamura, Yuko; Zheng, Wei; Ziegler, Regina G.; Stram, Alexander H.; Kolonel, Laurence N.; Le Marchand, Loïc; Henderson, Brian E.; Haiman, Christopher A.; Stram, Daniel O.

In: PLoS One, Vol. 10, No. 6, e0131106, 30.06.2015.

Research output: Contribution to journalArticle

Chen, F, He, J, Zhang, J, Chen, GK, Thomas, V, Ambrosone, CB, Bandera, EV, Berndt, SI, Bernstein, L, Blot, WJ, Cai, Q, Carpten, J, Casey, G, Chanock, SJ, Cheng, I, Chu, L, Deming, SL, Driver, WR, Goodman, P, Hayes, RB, Hennis, AJM, Hsing, AW, Hu, J, Ingles, SA, John, EM, Kittles, RA, Kolb, S, Leske, MC, Millikan, RC, Monroe, KR, Murphy, A, Nemesure, B, Neslund-Dudas, C, Nyante, S, Ostrander, EA, Press, MF, Rodriguez-Gil, JL, Rybicki, BA, Schumacher, F, Stanford, JL, Signorello, LB, Strom, SS, Stevens, V, Van Den Berg, D, Wang, Z, Witte, JS, Wu, SY, Yamamura, Y, Zheng, W, Ziegler, RG, Stram, AH, Kolonel, LN, Le Marchand, L, Henderson, BE, Haiman, CA & Stram, DO 2015, 'Methodological considerations in estimation of phenotype heritability using genome-wide SNP data, illustrated by an analysis of the heritability of height in a large sample of African ancestry adults', PLoS One, vol. 10, no. 6, e0131106. https://doi.org/10.1371/journal.pone.0131106
Chen, Fang ; He, Jing ; Zhang, Jianqi ; Chen, Gary K. ; Thomas, Venetta ; Ambrosone, Christine B. ; Bandera, Elisa V. ; Berndt, Sonja I. ; Bernstein, Leslie ; Blot, William J. ; Cai, Qiuyin ; Carpten, John ; Casey, Graham ; Chanock, Stephen J. ; Cheng, Iona ; Chu, Lisa ; Deming, Sandra L. ; Driver, W. Ryan ; Goodman, Phyllis ; Hayes, Richard B. ; Hennis, Anselm J M ; Hsing, Ann W. ; Hu, Jennifer ; Ingles, Sue A. ; John, Esther M. ; Kittles, Rick A. ; Kolb, Suzanne ; Leske, M. Cristina ; Millikan, Robert C. ; Monroe, Kristine R. ; Murphy, Adam ; Nemesure, Barbara ; Neslund-Dudas, Christine ; Nyante, Sarah ; Ostrander, Elaine A. ; Press, Michael F. ; Rodriguez-Gil, Jorge L. ; Rybicki, Ben A. ; Schumacher, Fredrick ; Stanford, Janet L. ; Signorello, Lisa B. ; Strom, Sara S. ; Stevens, Victoria ; Van Den Berg, David ; Wang, Zhaoming ; Witte, John S. ; Wu, Suh Yuh ; Yamamura, Yuko ; Zheng, Wei ; Ziegler, Regina G. ; Stram, Alexander H. ; Kolonel, Laurence N. ; Le Marchand, Loïc ; Henderson, Brian E. ; Haiman, Christopher A. ; Stram, Daniel O. / Methodological considerations in estimation of phenotype heritability using genome-wide SNP data, illustrated by an analysis of the heritability of height in a large sample of African ancestry adults. In: PLoS One. 2015 ; Vol. 10, No. 6.
@article{17599bb1a5bc49fda907848f3e43811b,
title = "Methodological considerations in estimation of phenotype heritability using genome-wide SNP data, illustrated by an analysis of the heritability of height in a large sample of African ancestry adults",
abstract = "Height has an extremely polygenic pattern of inheritance. Genome-wide association studies (GWAS) have revealed hundreds of common variants that are associated with human height at genome-wide levels of significance. However, only a small fraction of phenotypic variation can be explained by the aggregate of these common variants. In a large study of African-American men and women (n = 14,419), we genotyped and analyzed 966,578 autosomal SNPs across the entire genome using a linear mixed model variance components approach implemented in the program GCTA (Yang et al Nat Genet 2010), and estimated an additive heritability of 44.7{\%} (se: 3.7{\%}) for this phenotype in a sample of evidently unrelated individuals. While this estimated value is similar to that given by Yang et al in their analyses, we remain concerned about two related issues: (1) whether in the complete absence of hidden relatedness, variance components methods have adequate power to estimate heritability when a very large number of SNPs are used in the analysis; and (2) whether estimation of heritability may be biased, in real studies, by low levels of residual hidden relatedness. We addressed the first question in a semi-analytic fashion by directly simulating the distribution of the score statistic for a test of zero heritability with and without low levels of relatedness. The second question was addressed by a very careful comparison of the behavior of estimated heritability for both observed (self-reported) height and simulated phenotypes compared to imputation R2 as a function of the number of SNPs used in the analysis. These simulations help to address the important question about whether today's GWAS SNPs will remain useful for imputing causal variants that are discovered using very large sample sizes in future studies of height, or whether the causal variants themselves will need to be genotyped de novo in order to build a prediction model that ultimately captures a large fraction of the variability of height, and by implication other complex phenotypes. Our overall conclusions are that when study sizes are quite large (5,000 or so) the additive heritability estimate for height is not apparently biased upwards using the linear mixed model; however there is evidence in our simulation that a very large number of causal variants (many thousands) each with very small effect on phenotypic variance will need to be discovered to fill the gap between the heritability explained by known versus unknown causal variants. We conclude that today's GWAS data will remain useful in the future for causal variant prediction, but that finding the causal variants that need to be predicted may be extremely laborious.",
author = "Fang Chen and Jing He and Jianqi Zhang and Chen, {Gary K.} and Venetta Thomas and Ambrosone, {Christine B.} and Bandera, {Elisa V.} and Berndt, {Sonja I.} and Leslie Bernstein and Blot, {William J.} and Qiuyin Cai and John Carpten and Graham Casey and Chanock, {Stephen J.} and Iona Cheng and Lisa Chu and Deming, {Sandra L.} and Driver, {W. Ryan} and Phyllis Goodman and Hayes, {Richard B.} and Hennis, {Anselm J M} and Hsing, {Ann W.} and Jennifer Hu and Ingles, {Sue A.} and John, {Esther M.} and Kittles, {Rick A.} and Suzanne Kolb and Leske, {M. Cristina} and Millikan, {Robert C.} and Monroe, {Kristine R.} and Adam Murphy and Barbara Nemesure and Christine Neslund-Dudas and Sarah Nyante and Ostrander, {Elaine A.} and Press, {Michael F.} and Rodriguez-Gil, {Jorge L.} and Rybicki, {Ben A.} and Fredrick Schumacher and Stanford, {Janet L.} and Signorello, {Lisa B.} and Strom, {Sara S.} and Victoria Stevens and {Van Den Berg}, David and Zhaoming Wang and Witte, {John S.} and Wu, {Suh Yuh} and Yuko Yamamura and Wei Zheng and Ziegler, {Regina G.} and Stram, {Alexander H.} and Kolonel, {Laurence N.} and {Le Marchand}, Lo{\"i}c and Henderson, {Brian E.} and Haiman, {Christopher A.} and Stram, {Daniel O.}",
year = "2015",
month = "6",
day = "30",
doi = "10.1371/journal.pone.0131106",
language = "English (US)",
volume = "10",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "6",

}

TY - JOUR

T1 - Methodological considerations in estimation of phenotype heritability using genome-wide SNP data, illustrated by an analysis of the heritability of height in a large sample of African ancestry adults

AU - Chen, Fang

AU - He, Jing

AU - Zhang, Jianqi

AU - Chen, Gary K.

AU - Thomas, Venetta

AU - Ambrosone, Christine B.

AU - Bandera, Elisa V.

AU - Berndt, Sonja I.

AU - Bernstein, Leslie

AU - Blot, William J.

AU - Cai, Qiuyin

AU - Carpten, John

AU - Casey, Graham

AU - Chanock, Stephen J.

AU - Cheng, Iona

AU - Chu, Lisa

AU - Deming, Sandra L.

AU - Driver, W. Ryan

AU - Goodman, Phyllis

AU - Hayes, Richard B.

AU - Hennis, Anselm J M

AU - Hsing, Ann W.

AU - Hu, Jennifer

AU - Ingles, Sue A.

AU - John, Esther M.

AU - Kittles, Rick A.

AU - Kolb, Suzanne

AU - Leske, M. Cristina

AU - Millikan, Robert C.

AU - Monroe, Kristine R.

AU - Murphy, Adam

AU - Nemesure, Barbara

AU - Neslund-Dudas, Christine

AU - Nyante, Sarah

AU - Ostrander, Elaine A.

AU - Press, Michael F.

AU - Rodriguez-Gil, Jorge L.

AU - Rybicki, Ben A.

AU - Schumacher, Fredrick

AU - Stanford, Janet L.

AU - Signorello, Lisa B.

AU - Strom, Sara S.

AU - Stevens, Victoria

AU - Van Den Berg, David

AU - Wang, Zhaoming

AU - Witte, John S.

AU - Wu, Suh Yuh

AU - Yamamura, Yuko

AU - Zheng, Wei

AU - Ziegler, Regina G.

AU - Stram, Alexander H.

AU - Kolonel, Laurence N.

AU - Le Marchand, Loïc

AU - Henderson, Brian E.

AU - Haiman, Christopher A.

AU - Stram, Daniel O.

PY - 2015/6/30

Y1 - 2015/6/30

N2 - Height has an extremely polygenic pattern of inheritance. Genome-wide association studies (GWAS) have revealed hundreds of common variants that are associated with human height at genome-wide levels of significance. However, only a small fraction of phenotypic variation can be explained by the aggregate of these common variants. In a large study of African-American men and women (n = 14,419), we genotyped and analyzed 966,578 autosomal SNPs across the entire genome using a linear mixed model variance components approach implemented in the program GCTA (Yang et al Nat Genet 2010), and estimated an additive heritability of 44.7% (se: 3.7%) for this phenotype in a sample of evidently unrelated individuals. While this estimated value is similar to that given by Yang et al in their analyses, we remain concerned about two related issues: (1) whether in the complete absence of hidden relatedness, variance components methods have adequate power to estimate heritability when a very large number of SNPs are used in the analysis; and (2) whether estimation of heritability may be biased, in real studies, by low levels of residual hidden relatedness. We addressed the first question in a semi-analytic fashion by directly simulating the distribution of the score statistic for a test of zero heritability with and without low levels of relatedness. The second question was addressed by a very careful comparison of the behavior of estimated heritability for both observed (self-reported) height and simulated phenotypes compared to imputation R2 as a function of the number of SNPs used in the analysis. These simulations help to address the important question about whether today's GWAS SNPs will remain useful for imputing causal variants that are discovered using very large sample sizes in future studies of height, or whether the causal variants themselves will need to be genotyped de novo in order to build a prediction model that ultimately captures a large fraction of the variability of height, and by implication other complex phenotypes. Our overall conclusions are that when study sizes are quite large (5,000 or so) the additive heritability estimate for height is not apparently biased upwards using the linear mixed model; however there is evidence in our simulation that a very large number of causal variants (many thousands) each with very small effect on phenotypic variance will need to be discovered to fill the gap between the heritability explained by known versus unknown causal variants. We conclude that today's GWAS data will remain useful in the future for causal variant prediction, but that finding the causal variants that need to be predicted may be extremely laborious.

AB - Height has an extremely polygenic pattern of inheritance. Genome-wide association studies (GWAS) have revealed hundreds of common variants that are associated with human height at genome-wide levels of significance. However, only a small fraction of phenotypic variation can be explained by the aggregate of these common variants. In a large study of African-American men and women (n = 14,419), we genotyped and analyzed 966,578 autosomal SNPs across the entire genome using a linear mixed model variance components approach implemented in the program GCTA (Yang et al Nat Genet 2010), and estimated an additive heritability of 44.7% (se: 3.7%) for this phenotype in a sample of evidently unrelated individuals. While this estimated value is similar to that given by Yang et al in their analyses, we remain concerned about two related issues: (1) whether in the complete absence of hidden relatedness, variance components methods have adequate power to estimate heritability when a very large number of SNPs are used in the analysis; and (2) whether estimation of heritability may be biased, in real studies, by low levels of residual hidden relatedness. We addressed the first question in a semi-analytic fashion by directly simulating the distribution of the score statistic for a test of zero heritability with and without low levels of relatedness. The second question was addressed by a very careful comparison of the behavior of estimated heritability for both observed (self-reported) height and simulated phenotypes compared to imputation R2 as a function of the number of SNPs used in the analysis. These simulations help to address the important question about whether today's GWAS SNPs will remain useful for imputing causal variants that are discovered using very large sample sizes in future studies of height, or whether the causal variants themselves will need to be genotyped de novo in order to build a prediction model that ultimately captures a large fraction of the variability of height, and by implication other complex phenotypes. Our overall conclusions are that when study sizes are quite large (5,000 or so) the additive heritability estimate for height is not apparently biased upwards using the linear mixed model; however there is evidence in our simulation that a very large number of causal variants (many thousands) each with very small effect on phenotypic variance will need to be discovered to fill the gap between the heritability explained by known versus unknown causal variants. We conclude that today's GWAS data will remain useful in the future for causal variant prediction, but that finding the causal variants that need to be predicted may be extremely laborious.

UR - http://www.scopus.com/inward/record.url?scp=84938631897&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84938631897&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0131106

DO - 10.1371/journal.pone.0131106

M3 - Article

C2 - 26125186

AN - SCOPUS:84938631897

VL - 10

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 6

M1 - e0131106

ER -