A data-driven approach to conditional screening of high-dimensional variables

Hyokyoung G. Hong, Lan Wang, Xuming He

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


Marginal screening is a widely applied technique to handily reduce the dimensionality of the data when the number of potential features overwhelms the sample size. Because of the nature of the marginal screening procedures, they are also known for their difficulty in identifying the so-called hidden variables that are jointly important but have weak marginal associations with the response variable. Failing to include a hidden variable in the screening stage has two undesirable consequences: (1) important features are missed out in model selection, and (2) biased inference is likely to occur in the subsequent analysis. Motivated by some recent work in conditional screening, we propose a data-driven conditional screening algorithm, which is computationally efficient, enjoys the sure screening property under weaker assumptions on the model and works robustly in a variety of settings to reduce false negatives of hidden variables. Numerical comparison with alternatives screening procedures is also made to shed light on the relative merit of the proposed method. We illustrate the proposed methodology using a leukaemia microarray data example.

Original languageEnglish (US)
Pages (from-to)200-212
Number of pages13
Issue number1
StatePublished - 2016
Externally publishedYes


  • conditional screening
  • false negative
  • feature screening
  • high dimension
  • sparse principal component analysis
  • sure screening property

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty


Dive into the research topics of 'A data-driven approach to conditional screening of high-dimensional variables'. Together they form a unique fingerprint.

Cite this