Monte Carlo permutation tests can be formally constructed by choosing a set of permutations of individual indices and a real-valued test statistic measuring the association between genotypes and affection status. In this paper, we develop a rigorous theoretical framework for verifying the validity of these tests when there are missing genotypes. We begin by specifying a nonparametric probability model for the observed genotype data in a genetic case-control study with unrelated subjects. Under this model and some minimal assumptions about the test statistic, we establish that the resulting Monte Carlo permutation test is exact level α if (1) the chosen set of permutations of individual indices is a group under composition and (2) the distribution of the observed genotype score matrix under the null hypothesis does not change if the assignment of individuals to rows is shuffled according to an arbitrary permutation in this set. We apply these conditions to show that frequently used Monte Carlo permutation tests based on the set of all permutations of individual indices are guaranteed to be exact level α only for missing data processes satisfying a rather restrictive additional assumption. However, if the missing data process depends on covariates that are all identified and recorded, we also show that Monte Carlo permutation tests based on the set of permutations within strata of individuals with identical covariate values are exact level α. Our theoretical results are verified and supplemented by simulations for a variety of missing data processes and test statistics.
- Missing genotypes
- Monte Carlo permutation tests
- Type I error
ASJC Scopus subject areas