Valid monte carlo permutation tests for genetic case-control studies with missing genotypes

Daniel D. Kinnamon, Eden R Martin

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Monte Carlo permutation tests can be formally constructed by choosing a set of permutations of individual indices and a real-valued test statistic measuring the association between genotypes and affection status. In this paper, we develop a rigorous theoretical framework for verifying the validity of these tests when there are missing genotypes. We begin by specifying a nonparametric probability model for the observed genotype data in a genetic case-control study with unrelated subjects. Under this model and some minimal assumptions about the test statistic, we establish that the resulting Monte Carlo permutation test is exact level α if (1) the chosen set of permutations of individual indices is a group under composition and (2) the distribution of the observed genotype score matrix under the null hypothesis does not change if the assignment of individuals to rows is shuffled according to an arbitrary permutation in this set. We apply these conditions to show that frequently used Monte Carlo permutation tests based on the set of all permutations of individual indices are guaranteed to be exact level α only for missing data processes satisfying a rather restrictive additional assumption. However, if the missing data process depends on covariates that are all identified and recorded, we also show that Monte Carlo permutation tests based on the set of permutations within strata of individuals with identical covariate values are exact level α. Our theoretical results are verified and supplemented by simulations for a variety of missing data processes and test statistics.

Original languageEnglish
Pages (from-to)325-344
Number of pages20
JournalGenetic Epidemiology
Volume38
Issue number4
DOIs
StatePublished - Jan 1 2014

Fingerprint

Case-Control Studies
Genotype

Keywords

  • Case-control
  • Missing genotypes
  • Monte Carlo permutation tests
  • Type I error

ASJC Scopus subject areas

  • Genetics(clinical)
  • Epidemiology

Cite this

Valid monte carlo permutation tests for genetic case-control studies with missing genotypes. / Kinnamon, Daniel D.; Martin, Eden R.

In: Genetic Epidemiology, Vol. 38, No. 4, 01.01.2014, p. 325-344.

Research output: Contribution to journalArticle

@article{e5ec202c9bb24de9b7a8c72e62bb1a62,
title = "Valid monte carlo permutation tests for genetic case-control studies with missing genotypes",
abstract = "Monte Carlo permutation tests can be formally constructed by choosing a set of permutations of individual indices and a real-valued test statistic measuring the association between genotypes and affection status. In this paper, we develop a rigorous theoretical framework for verifying the validity of these tests when there are missing genotypes. We begin by specifying a nonparametric probability model for the observed genotype data in a genetic case-control study with unrelated subjects. Under this model and some minimal assumptions about the test statistic, we establish that the resulting Monte Carlo permutation test is exact level α if (1) the chosen set of permutations of individual indices is a group under composition and (2) the distribution of the observed genotype score matrix under the null hypothesis does not change if the assignment of individuals to rows is shuffled according to an arbitrary permutation in this set. We apply these conditions to show that frequently used Monte Carlo permutation tests based on the set of all permutations of individual indices are guaranteed to be exact level α only for missing data processes satisfying a rather restrictive additional assumption. However, if the missing data process depends on covariates that are all identified and recorded, we also show that Monte Carlo permutation tests based on the set of permutations within strata of individuals with identical covariate values are exact level α. Our theoretical results are verified and supplemented by simulations for a variety of missing data processes and test statistics.",
keywords = "Case-control, Missing genotypes, Monte Carlo permutation tests, Type I error",
author = "Kinnamon, {Daniel D.} and Martin, {Eden R}",
year = "2014",
month = "1",
day = "1",
doi = "10.1002/gepi.21805",
language = "English",
volume = "38",
pages = "325--344",
journal = "Genetic Epidemiology",
issn = "0741-0395",
publisher = "Wiley-Liss Inc.",
number = "4",

}

TY - JOUR

T1 - Valid monte carlo permutation tests for genetic case-control studies with missing genotypes

AU - Kinnamon, Daniel D.

AU - Martin, Eden R

PY - 2014/1/1

Y1 - 2014/1/1

N2 - Monte Carlo permutation tests can be formally constructed by choosing a set of permutations of individual indices and a real-valued test statistic measuring the association between genotypes and affection status. In this paper, we develop a rigorous theoretical framework for verifying the validity of these tests when there are missing genotypes. We begin by specifying a nonparametric probability model for the observed genotype data in a genetic case-control study with unrelated subjects. Under this model and some minimal assumptions about the test statistic, we establish that the resulting Monte Carlo permutation test is exact level α if (1) the chosen set of permutations of individual indices is a group under composition and (2) the distribution of the observed genotype score matrix under the null hypothesis does not change if the assignment of individuals to rows is shuffled according to an arbitrary permutation in this set. We apply these conditions to show that frequently used Monte Carlo permutation tests based on the set of all permutations of individual indices are guaranteed to be exact level α only for missing data processes satisfying a rather restrictive additional assumption. However, if the missing data process depends on covariates that are all identified and recorded, we also show that Monte Carlo permutation tests based on the set of permutations within strata of individuals with identical covariate values are exact level α. Our theoretical results are verified and supplemented by simulations for a variety of missing data processes and test statistics.

AB - Monte Carlo permutation tests can be formally constructed by choosing a set of permutations of individual indices and a real-valued test statistic measuring the association between genotypes and affection status. In this paper, we develop a rigorous theoretical framework for verifying the validity of these tests when there are missing genotypes. We begin by specifying a nonparametric probability model for the observed genotype data in a genetic case-control study with unrelated subjects. Under this model and some minimal assumptions about the test statistic, we establish that the resulting Monte Carlo permutation test is exact level α if (1) the chosen set of permutations of individual indices is a group under composition and (2) the distribution of the observed genotype score matrix under the null hypothesis does not change if the assignment of individuals to rows is shuffled according to an arbitrary permutation in this set. We apply these conditions to show that frequently used Monte Carlo permutation tests based on the set of all permutations of individual indices are guaranteed to be exact level α only for missing data processes satisfying a rather restrictive additional assumption. However, if the missing data process depends on covariates that are all identified and recorded, we also show that Monte Carlo permutation tests based on the set of permutations within strata of individuals with identical covariate values are exact level α. Our theoretical results are verified and supplemented by simulations for a variety of missing data processes and test statistics.

KW - Case-control

KW - Missing genotypes

KW - Monte Carlo permutation tests

KW - Type I error

UR - http://www.scopus.com/inward/record.url?scp=84898684415&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898684415&partnerID=8YFLogxK

U2 - 10.1002/gepi.21805

DO - 10.1002/gepi.21805

M3 - Article

VL - 38

SP - 325

EP - 344

JO - Genetic Epidemiology

JF - Genetic Epidemiology

SN - 0741-0395

IS - 4

ER -