Clustering-based binary-class classification for imbalanced data sets

Chao Chen, Mei-Ling Shyu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

In this paper, we propose a new clustering-based binary-class classification framework that integrates the clustering technique into a binary-class classification approach to handle the imbalanced data sets. A binary-class classifier is designed to classify a set of data instances into two classes; while the clustering technique partitions the data instances into groups according to their similarity to each other. After applying a clustering algorithm, the data instances within the same group usually have a higher similarity, and the differences among the data instances between different groups should be larger. In our proposed framework, all negative data instances are first clustered into a set of negative groups. Next, the negative data instances in each negative group are combined with all positive data instances to construct a balanced binary-class data set. Finally, subspace models trained on these balanced binary-class data sets are integrated with the subspace model trained on the original imbalanced data set to form the proposed classification model. Experimental results demonstrate that our proposed classification framework performs better than the comparative classification approaches as well as the subspace modeling method trained on the original data set alone.

Original languageEnglish
Title of host publicationProceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011
Pages384-389
Number of pages6
DOIs
StatePublished - Sep 29 2011
Event12th IEEE International Conference on Information Reuse and Integration, IRI 2011 - Las Vegas, NV, United States
Duration: Aug 3 2011Aug 5 2011

Other

Other12th IEEE International Conference on Information Reuse and Integration, IRI 2011
CountryUnited States
CityLas Vegas, NV
Period8/3/118/5/11

Fingerprint

Clustering algorithms
Classifiers
Clustering
Integrated
Classifier
Clustering algorithm
Modeling method

Keywords

  • Binary classification
  • Clustering
  • Imbalanced data sets
  • Subspace Modeling

ASJC Scopus subject areas

  • Information Systems
  • Information Systems and Management

Cite this

Chen, C., & Shyu, M-L. (2011). Clustering-based binary-class classification for imbalanced data sets. In Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011 (pp. 384-389). [6009578] https://doi.org/10.1109/IRI.2011.6009578

Clustering-based binary-class classification for imbalanced data sets. / Chen, Chao; Shyu, Mei-Ling.

Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011. 2011. p. 384-389 6009578.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, C & Shyu, M-L 2011, Clustering-based binary-class classification for imbalanced data sets. in Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011., 6009578, pp. 384-389, 12th IEEE International Conference on Information Reuse and Integration, IRI 2011, Las Vegas, NV, United States, 8/3/11. https://doi.org/10.1109/IRI.2011.6009578
Chen C, Shyu M-L. Clustering-based binary-class classification for imbalanced data sets. In Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011. 2011. p. 384-389. 6009578 https://doi.org/10.1109/IRI.2011.6009578
Chen, Chao ; Shyu, Mei-Ling. / Clustering-based binary-class classification for imbalanced data sets. Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011. 2011. pp. 384-389
@inproceedings{1fe978db939d4cfc8ad9bc309e706fb1,
title = "Clustering-based binary-class classification for imbalanced data sets",
abstract = "In this paper, we propose a new clustering-based binary-class classification framework that integrates the clustering technique into a binary-class classification approach to handle the imbalanced data sets. A binary-class classifier is designed to classify a set of data instances into two classes; while the clustering technique partitions the data instances into groups according to their similarity to each other. After applying a clustering algorithm, the data instances within the same group usually have a higher similarity, and the differences among the data instances between different groups should be larger. In our proposed framework, all negative data instances are first clustered into a set of negative groups. Next, the negative data instances in each negative group are combined with all positive data instances to construct a balanced binary-class data set. Finally, subspace models trained on these balanced binary-class data sets are integrated with the subspace model trained on the original imbalanced data set to form the proposed classification model. Experimental results demonstrate that our proposed classification framework performs better than the comparative classification approaches as well as the subspace modeling method trained on the original data set alone.",
keywords = "Binary classification, Clustering, Imbalanced data sets, Subspace Modeling",
author = "Chao Chen and Mei-Ling Shyu",
year = "2011",
month = "9",
day = "29",
doi = "10.1109/IRI.2011.6009578",
language = "English",
isbn = "9781457709661",
pages = "384--389",
booktitle = "Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011",

}

TY - GEN

T1 - Clustering-based binary-class classification for imbalanced data sets

AU - Chen, Chao

AU - Shyu, Mei-Ling

PY - 2011/9/29

Y1 - 2011/9/29

N2 - In this paper, we propose a new clustering-based binary-class classification framework that integrates the clustering technique into a binary-class classification approach to handle the imbalanced data sets. A binary-class classifier is designed to classify a set of data instances into two classes; while the clustering technique partitions the data instances into groups according to their similarity to each other. After applying a clustering algorithm, the data instances within the same group usually have a higher similarity, and the differences among the data instances between different groups should be larger. In our proposed framework, all negative data instances are first clustered into a set of negative groups. Next, the negative data instances in each negative group are combined with all positive data instances to construct a balanced binary-class data set. Finally, subspace models trained on these balanced binary-class data sets are integrated with the subspace model trained on the original imbalanced data set to form the proposed classification model. Experimental results demonstrate that our proposed classification framework performs better than the comparative classification approaches as well as the subspace modeling method trained on the original data set alone.

AB - In this paper, we propose a new clustering-based binary-class classification framework that integrates the clustering technique into a binary-class classification approach to handle the imbalanced data sets. A binary-class classifier is designed to classify a set of data instances into two classes; while the clustering technique partitions the data instances into groups according to their similarity to each other. After applying a clustering algorithm, the data instances within the same group usually have a higher similarity, and the differences among the data instances between different groups should be larger. In our proposed framework, all negative data instances are first clustered into a set of negative groups. Next, the negative data instances in each negative group are combined with all positive data instances to construct a balanced binary-class data set. Finally, subspace models trained on these balanced binary-class data sets are integrated with the subspace model trained on the original imbalanced data set to form the proposed classification model. Experimental results demonstrate that our proposed classification framework performs better than the comparative classification approaches as well as the subspace modeling method trained on the original data set alone.

KW - Binary classification

KW - Clustering

KW - Imbalanced data sets

KW - Subspace Modeling

UR - http://www.scopus.com/inward/record.url?scp=80053151187&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053151187&partnerID=8YFLogxK

U2 - 10.1109/IRI.2011.6009578

DO - 10.1109/IRI.2011.6009578

M3 - Conference contribution

SN - 9781457709661

SP - 384

EP - 389

BT - Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011

ER -