Imputation of Area-Level Covariates by Registry Linking

Research output: Contribution to journalArticlepeer-review


Social epidemiological research has long studied the impact of social determinants of place on health outcomes. Geocoding is a well-known technique for extracting such information by mapping geographical location to census tract and then extracting relevant information from tract-level databases. However, sometimes location information is unknown. This is often the case when using many of today's public databases (e.g., genomic data repositories). For some diseases such as cancer, statewide registries exist which provide a strategy for building a linking model between analysis observations and a reference sample drawn from the registry using variables in common to both. We detail this methodology and then show how to use this linking model together with classified mixed model prediction to impute area-level covariates for analysis observations. We study empirical performance via a series of simulations, and then perform predictive geocoding on colon cancer patients drawn (both analysis and reference samples) from the Florida Cancer Data Systems registry.

Original languageEnglish (US)
JournalHandbook of Statistics
StateAccepted/In press - Jan 1 2017


  • Area-level covariates
  • Cancer registries
  • Census tracts
  • Geocoding
  • Imputation
  • Mixed models
  • Prediction
  • Spatial data

ASJC Scopus subject areas

  • Statistics and Probability
  • Modeling and Simulation
  • Applied Mathematics

Fingerprint Dive into the research topics of 'Imputation of Area-Level Covariates by Registry Linking'. Together they form a unique fingerprint.

Cite this