A random forests quantile classifier for class imbalanced data

Robert O'Brien, Hemant Ishwaran

Research output: Contribution to journalArticle

6 Citations (Scopus)

Abstract

Extending previous work on quantile classifiers (q-classifiers) we propose the q*-classifier for the class imbalance problem. The classifier assigns a sample to the minority class if the minority class conditional probability exceeds 0 < q* < 1, where q* equals the unconditional probability of observing a minority class sample. The motivation for q*-classification stems from a density-based approach and leads to the useful property that the q*-classifier maximizes the sum of the true positive and true negative rates. Moreover, because the procedure can be equivalently expressed as a cost-weighted Bayes classifier, it also minimizes weighted risk. Because of this dual optimization, the q*-classifier can achieve near zero risk in imbalance problems, while simultaneously optimizing true positive and true negative rates. We use random forests to apply q*-classification. This new method which we call RFQ is shown to outperform or is competitive with existing techniques with respect to G-mean performance and variable selection. Extensions to the multiclass imbalanced setting are also considered.

Original languageEnglish (US)
Pages (from-to)232-249
Number of pages18
JournalPattern Recognition
Volume90
DOIs
StatePublished - Jun 1 2019

Fingerprint

Classifiers
Costs

Keywords

  • Class imbalance
  • Minority class
  • Random forests
  • Response-based sampling
  • Weighted Bayes classifier

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this

A random forests quantile classifier for class imbalanced data. / O'Brien, Robert; Ishwaran, Hemant.

In: Pattern Recognition, Vol. 90, 01.06.2019, p. 232-249.

Research output: Contribution to journalArticle

@article{217fd0f0f20245e8a7a817b7dbdb0792,
title = "A random forests quantile classifier for class imbalanced data",
abstract = "Extending previous work on quantile classifiers (q-classifiers) we propose the q*-classifier for the class imbalance problem. The classifier assigns a sample to the minority class if the minority class conditional probability exceeds 0 < q* < 1, where q* equals the unconditional probability of observing a minority class sample. The motivation for q*-classification stems from a density-based approach and leads to the useful property that the q*-classifier maximizes the sum of the true positive and true negative rates. Moreover, because the procedure can be equivalently expressed as a cost-weighted Bayes classifier, it also minimizes weighted risk. Because of this dual optimization, the q*-classifier can achieve near zero risk in imbalance problems, while simultaneously optimizing true positive and true negative rates. We use random forests to apply q*-classification. This new method which we call RFQ is shown to outperform or is competitive with existing techniques with respect to G-mean performance and variable selection. Extensions to the multiclass imbalanced setting are also considered.",
keywords = "Class imbalance, Minority class, Random forests, Response-based sampling, Weighted Bayes classifier",
author = "Robert O'Brien and Hemant Ishwaran",
year = "2019",
month = "6",
day = "1",
doi = "10.1016/j.patcog.2019.01.036",
language = "English (US)",
volume = "90",
pages = "232--249",
journal = "Pattern Recognition",
issn = "0031-3203",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - A random forests quantile classifier for class imbalanced data

AU - O'Brien, Robert

AU - Ishwaran, Hemant

PY - 2019/6/1

Y1 - 2019/6/1

N2 - Extending previous work on quantile classifiers (q-classifiers) we propose the q*-classifier for the class imbalance problem. The classifier assigns a sample to the minority class if the minority class conditional probability exceeds 0 < q* < 1, where q* equals the unconditional probability of observing a minority class sample. The motivation for q*-classification stems from a density-based approach and leads to the useful property that the q*-classifier maximizes the sum of the true positive and true negative rates. Moreover, because the procedure can be equivalently expressed as a cost-weighted Bayes classifier, it also minimizes weighted risk. Because of this dual optimization, the q*-classifier can achieve near zero risk in imbalance problems, while simultaneously optimizing true positive and true negative rates. We use random forests to apply q*-classification. This new method which we call RFQ is shown to outperform or is competitive with existing techniques with respect to G-mean performance and variable selection. Extensions to the multiclass imbalanced setting are also considered.

AB - Extending previous work on quantile classifiers (q-classifiers) we propose the q*-classifier for the class imbalance problem. The classifier assigns a sample to the minority class if the minority class conditional probability exceeds 0 < q* < 1, where q* equals the unconditional probability of observing a minority class sample. The motivation for q*-classification stems from a density-based approach and leads to the useful property that the q*-classifier maximizes the sum of the true positive and true negative rates. Moreover, because the procedure can be equivalently expressed as a cost-weighted Bayes classifier, it also minimizes weighted risk. Because of this dual optimization, the q*-classifier can achieve near zero risk in imbalance problems, while simultaneously optimizing true positive and true negative rates. We use random forests to apply q*-classification. This new method which we call RFQ is shown to outperform or is competitive with existing techniques with respect to G-mean performance and variable selection. Extensions to the multiclass imbalanced setting are also considered.

KW - Class imbalance

KW - Minority class

KW - Random forests

KW - Response-based sampling

KW - Weighted Bayes classifier

UR - http://www.scopus.com/inward/record.url?scp=85060860214&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060860214&partnerID=8YFLogxK

U2 - 10.1016/j.patcog.2019.01.036

DO - 10.1016/j.patcog.2019.01.036

M3 - Article

AN - SCOPUS:85060860214

VL - 90

SP - 232

EP - 249

JO - Pattern Recognition

JF - Pattern Recognition

SN - 0031-3203

ER -