High-dimensional variable selection for survival data

Hemant Ishwaran, Udaya B. Kogalur, Eiran Z. Gorodeski, Andy J. Minn, Michael S. Lauer

Research output: Contribution to journalArticlepeer-review

197 Scopus citations


The minimal depth of a maximal subtree is a dimensionless order statistic measuring the predictiveness of a variable in a survival tree. We derive the distribution of the minimal depth and use it for high-dimensional variable selection using random survival forests. In big p and small n problems (where p is the dimension and n is the sample size), the distribution of the minimal depth reveals a "ceiling effect" in which a tree simply cannot be grown deep enough to properly identify predictive variables. Motivated by this limitation, we develop a new regularized algorithm, termed RSF-Variable Hunting. This algorithm exploits maximal subtrees for effective variable selection under such scenarios. Several applications are presented demonstrating the methodology, including the problem of gene selection using microarray data. In this work we focus only on survival settings, although our methodology also applies to other random forests applications, including regression and classification settings. All examples presented here use the R-software package randomSurvivalForest.

Original languageEnglish (US)
Pages (from-to)205-217
Number of pages13
JournalJournal of the American Statistical Association
Issue number489
StatePublished - Mar 1 2010
Externally publishedYes


  • Forest
  • Maximal subtree
  • Minimal depth
  • Random survival forest
  • Tree
  • VIMP

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty


Dive into the research topics of 'High-dimensional variable selection for survival data'. Together they form a unique fingerprint.

Cite this