A random forests quantile classifier for class imbalanced data

Robert O'Brien, Hemant Ishwaran

Research output: Contribution to journalArticlepeer-review

25 Scopus citations

Abstract

Extending previous work on quantile classifiers (q-classifiers) we propose the q*-classifier for the class imbalance problem. The classifier assigns a sample to the minority class if the minority class conditional probability exceeds 0 < q* < 1, where q* equals the unconditional probability of observing a minority class sample. The motivation for q*-classification stems from a density-based approach and leads to the useful property that the q*-classifier maximizes the sum of the true positive and true negative rates. Moreover, because the procedure can be equivalently expressed as a cost-weighted Bayes classifier, it also minimizes weighted risk. Because of this dual optimization, the q*-classifier can achieve near zero risk in imbalance problems, while simultaneously optimizing true positive and true negative rates. We use random forests to apply q*-classification. This new method which we call RFQ is shown to outperform or is competitive with existing techniques with respect to G-mean performance and variable selection. Extensions to the multiclass imbalanced setting are also considered.

Original languageEnglish (US)
Pages (from-to)232-249
Number of pages18
JournalPattern Recognition
Volume90
DOIs
StatePublished - Jun 2019

Keywords

  • Class imbalance
  • Minority class
  • Random forests
  • Response-based sampling
  • Weighted Bayes classifier

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'A random forests quantile classifier for class imbalanced data'. Together they form a unique fingerprint.

Cite this