Variable importance in binary regression trees and forests

Research output: Contribution to journalArticle

137 Scopus citations

Abstract

We characterize and study variable importance (VIMP) and pairwise variable associations in binary regression trees. A key component involves the node mean squared error for a quantity we refer to as a maximal subtree. The theory naturally extends from single trees to ensembles of trees and applies to methods like random forests. This is useful because while importance values from random forests are used to screen variables, for example they are used to filter high throughput genomic data in Bioinformatics, very little theory exists about their properties.

Original languageEnglish (US)
Pages (from-to)519-537
Number of pages19
JournalElectronic Journal of Statistics
Volume1
DOIs
StatePublished - Jan 1 2007
Externally publishedYes

Keywords

  • CART
  • Maximal subtree
  • Random forests

ASJC Scopus subject areas

  • Statistics and Probability

Fingerprint Dive into the research topics of 'Variable importance in binary regression trees and forests'. Together they form a unique fingerprint.

  • Cite this