Imputation of missing ages in pedigree data

Raymond Balise, Yu Chen, Gillian Dite, Anna Felberg, Limei Sun, Argyrios Ziogas, Alice S. Whittemore

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Background: In human pedigree data age at disease occurrence frequently is missing and is imputed using various methods. However, little is known about the performance of these methods when applied to families. In particular, there is little information about the level of agreement between imputed and actual values of temporal data and their effects on inferences. Methods: We performed two evaluations of five imputation methods used to generate complete data for repositories to be shared by many investigators. Two of the methods are mean substitution methods, two are regression methods and one is a multiple imputation method based on one of the regression methods. To evaluate the methods, we randomly deleted the years of disease diagnosis of some men in a sample of pedigrees ascertained as part of a prostate cancer study. In the first evaluation, we used the five methods to impute the missing diagnosis years and evaluated agreement between imputed and actual values. In the second evaluation, we compared agreement between regression coefficients estimated using imputed diagnosis years with those estimated using the actual years. Results/Conclusions: For both evaluations, we found optimal or near-optimal performance from a regression method that imputes a man's diagnosis year based on the year of birth and year of last observation of all affected men with complete data. The multiple imputation analogue of this method also performed well.

Original languageEnglish (US)
Pages (from-to)168-174
Number of pages7
JournalHuman Heredity
Volume63
Issue number3-4
DOIs
StatePublished - Mar 2007
Externally publishedYes

Fingerprint

Pedigree
Prostatic Neoplasms
Research Personnel
Observation
Parturition

Keywords

  • Cancer
  • Disease onset
  • Imputation methods
  • Missing data

ASJC Scopus subject areas

  • Genetics(clinical)

Cite this

Balise, R., Chen, Y., Dite, G., Felberg, A., Sun, L., Ziogas, A., & Whittemore, A. S. (2007). Imputation of missing ages in pedigree data. Human Heredity, 63(3-4), 168-174. https://doi.org/10.1159/000099829

Imputation of missing ages in pedigree data. / Balise, Raymond; Chen, Yu; Dite, Gillian; Felberg, Anna; Sun, Limei; Ziogas, Argyrios; Whittemore, Alice S.

In: Human Heredity, Vol. 63, No. 3-4, 03.2007, p. 168-174.

Research output: Contribution to journalArticle

Balise, R, Chen, Y, Dite, G, Felberg, A, Sun, L, Ziogas, A & Whittemore, AS 2007, 'Imputation of missing ages in pedigree data', Human Heredity, vol. 63, no. 3-4, pp. 168-174. https://doi.org/10.1159/000099829
Balise R, Chen Y, Dite G, Felberg A, Sun L, Ziogas A et al. Imputation of missing ages in pedigree data. Human Heredity. 2007 Mar;63(3-4):168-174. https://doi.org/10.1159/000099829
Balise, Raymond ; Chen, Yu ; Dite, Gillian ; Felberg, Anna ; Sun, Limei ; Ziogas, Argyrios ; Whittemore, Alice S. / Imputation of missing ages in pedigree data. In: Human Heredity. 2007 ; Vol. 63, No. 3-4. pp. 168-174.
@article{9a1cfea3991a4c16bbedc6e95eb1e301,
title = "Imputation of missing ages in pedigree data",
abstract = "Background: In human pedigree data age at disease occurrence frequently is missing and is imputed using various methods. However, little is known about the performance of these methods when applied to families. In particular, there is little information about the level of agreement between imputed and actual values of temporal data and their effects on inferences. Methods: We performed two evaluations of five imputation methods used to generate complete data for repositories to be shared by many investigators. Two of the methods are mean substitution methods, two are regression methods and one is a multiple imputation method based on one of the regression methods. To evaluate the methods, we randomly deleted the years of disease diagnosis of some men in a sample of pedigrees ascertained as part of a prostate cancer study. In the first evaluation, we used the five methods to impute the missing diagnosis years and evaluated agreement between imputed and actual values. In the second evaluation, we compared agreement between regression coefficients estimated using imputed diagnosis years with those estimated using the actual years. Results/Conclusions: For both evaluations, we found optimal or near-optimal performance from a regression method that imputes a man's diagnosis year based on the year of birth and year of last observation of all affected men with complete data. The multiple imputation analogue of this method also performed well.",
keywords = "Cancer, Disease onset, Imputation methods, Missing data",
author = "Raymond Balise and Yu Chen and Gillian Dite and Anna Felberg and Limei Sun and Argyrios Ziogas and Whittemore, {Alice S.}",
year = "2007",
month = "3",
doi = "10.1159/000099829",
language = "English (US)",
volume = "63",
pages = "168--174",
journal = "Human Heredity",
issn = "0001-5652",
publisher = "S. Karger AG",
number = "3-4",

}

TY - JOUR

T1 - Imputation of missing ages in pedigree data

AU - Balise, Raymond

AU - Chen, Yu

AU - Dite, Gillian

AU - Felberg, Anna

AU - Sun, Limei

AU - Ziogas, Argyrios

AU - Whittemore, Alice S.

PY - 2007/3

Y1 - 2007/3

N2 - Background: In human pedigree data age at disease occurrence frequently is missing and is imputed using various methods. However, little is known about the performance of these methods when applied to families. In particular, there is little information about the level of agreement between imputed and actual values of temporal data and their effects on inferences. Methods: We performed two evaluations of five imputation methods used to generate complete data for repositories to be shared by many investigators. Two of the methods are mean substitution methods, two are regression methods and one is a multiple imputation method based on one of the regression methods. To evaluate the methods, we randomly deleted the years of disease diagnosis of some men in a sample of pedigrees ascertained as part of a prostate cancer study. In the first evaluation, we used the five methods to impute the missing diagnosis years and evaluated agreement between imputed and actual values. In the second evaluation, we compared agreement between regression coefficients estimated using imputed diagnosis years with those estimated using the actual years. Results/Conclusions: For both evaluations, we found optimal or near-optimal performance from a regression method that imputes a man's diagnosis year based on the year of birth and year of last observation of all affected men with complete data. The multiple imputation analogue of this method also performed well.

AB - Background: In human pedigree data age at disease occurrence frequently is missing and is imputed using various methods. However, little is known about the performance of these methods when applied to families. In particular, there is little information about the level of agreement between imputed and actual values of temporal data and their effects on inferences. Methods: We performed two evaluations of five imputation methods used to generate complete data for repositories to be shared by many investigators. Two of the methods are mean substitution methods, two are regression methods and one is a multiple imputation method based on one of the regression methods. To evaluate the methods, we randomly deleted the years of disease diagnosis of some men in a sample of pedigrees ascertained as part of a prostate cancer study. In the first evaluation, we used the five methods to impute the missing diagnosis years and evaluated agreement between imputed and actual values. In the second evaluation, we compared agreement between regression coefficients estimated using imputed diagnosis years with those estimated using the actual years. Results/Conclusions: For both evaluations, we found optimal or near-optimal performance from a regression method that imputes a man's diagnosis year based on the year of birth and year of last observation of all affected men with complete data. The multiple imputation analogue of this method also performed well.

KW - Cancer

KW - Disease onset

KW - Imputation methods

KW - Missing data

UR - http://www.scopus.com/inward/record.url?scp=33947386704&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33947386704&partnerID=8YFLogxK

U2 - 10.1159/000099829

DO - 10.1159/000099829

M3 - Article

C2 - 17310126

AN - SCOPUS:33947386704

VL - 63

SP - 168

EP - 174

JO - Human Heredity

JF - Human Heredity

SN - 0001-5652

IS - 3-4

ER -