Random forests for genomic data analysis

Research output: Contribution to journalArticle

223 Scopus citations


Random forests (RF) is a popular tree-based ensemble machine learning tool that is highly data adaptive, applies to "large p, small n" problems, and is able to account for correlation as well as interactions among features. This makes RF particularly appealing for high-dimensional genomic data analysis. In this article, we systematically review the applications and recent progresses of RF for genomic data, including prediction and classification, variable selection, pathway analysis, genetic association and epistasis detection, and unsupervised learning.

Original languageEnglish
Pages (from-to)323-329
Number of pages7
Issue number6
StatePublished - Jun 1 2012



  • Classification
  • Genomic data analysis
  • Prediction
  • Random forests
  • Random survival forests
  • Variable selection

ASJC Scopus subject areas

  • Genetics

Cite this