Utilizing concept correlations for effective imbalanced data classification

Yilin Yan, Yang Liu, Mei-Ling Shyu, Min Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

13 Citations (Scopus)

Abstract

Data imbalance is a challenging and common problem in data mining and machine learning areas, and has attracted significant research efforts. A data set is considered imbalanced when the data instances (samples) are not close to uniformly distributed across different classes/categories, which is very common in real-world data sets. It is likely to result in biased classification results. In this paper, a two-phase classification framework is proposed to make the classification of imbalanced data more accurate and stable. The proposed framework is based on the correlations generated between concepts. The general idea is to identify negative data instances which have certain positive correlations with data instances in the target concept to facilitate the classification task. The experimental results show that our framework is effective in imbalanced data classification and is robust to feature descriptors by comparing it with four existing approaches using four different kinds of feature representations.

Original languageEnglish (US)
Title of host publicationProceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages561-568
Number of pages8
ISBN (Print)9781479958801
DOIs
StatePublished - Feb 27 2014
Event15th IEEE International Conference on Information Reuse and Integration, IEEE IRI 2014 - San Francisco, United States
Duration: Aug 13 2014Aug 15 2014

Other

Other15th IEEE International Conference on Information Reuse and Integration, IEEE IRI 2014
CountryUnited States
CitySan Francisco
Period8/13/148/15/14

Fingerprint

Data mining
Learning systems

Keywords

  • Classification
  • Correlation
  • Imbalanced data
  • Rare class mining
  • Skewed data

ASJC Scopus subject areas

  • Information Systems

Cite this

Yan, Y., Liu, Y., Shyu, M-L., & Chen, M. (2014). Utilizing concept correlations for effective imbalanced data classification. In Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014 (pp. 561-568). [7051939] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IRI.2014.7051939

Utilizing concept correlations for effective imbalanced data classification. / Yan, Yilin; Liu, Yang; Shyu, Mei-Ling; Chen, Min.

Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014. Institute of Electrical and Electronics Engineers Inc., 2014. p. 561-568 7051939.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Yan, Y, Liu, Y, Shyu, M-L & Chen, M 2014, Utilizing concept correlations for effective imbalanced data classification. in Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014., 7051939, Institute of Electrical and Electronics Engineers Inc., pp. 561-568, 15th IEEE International Conference on Information Reuse and Integration, IEEE IRI 2014, San Francisco, United States, 8/13/14. https://doi.org/10.1109/IRI.2014.7051939
Yan Y, Liu Y, Shyu M-L, Chen M. Utilizing concept correlations for effective imbalanced data classification. In Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014. Institute of Electrical and Electronics Engineers Inc. 2014. p. 561-568. 7051939 https://doi.org/10.1109/IRI.2014.7051939
Yan, Yilin ; Liu, Yang ; Shyu, Mei-Ling ; Chen, Min. / Utilizing concept correlations for effective imbalanced data classification. Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014. Institute of Electrical and Electronics Engineers Inc., 2014. pp. 561-568
@inproceedings{a9d49f4b28db4db981b730a05c1c5a9e,
title = "Utilizing concept correlations for effective imbalanced data classification",
abstract = "Data imbalance is a challenging and common problem in data mining and machine learning areas, and has attracted significant research efforts. A data set is considered imbalanced when the data instances (samples) are not close to uniformly distributed across different classes/categories, which is very common in real-world data sets. It is likely to result in biased classification results. In this paper, a two-phase classification framework is proposed to make the classification of imbalanced data more accurate and stable. The proposed framework is based on the correlations generated between concepts. The general idea is to identify negative data instances which have certain positive correlations with data instances in the target concept to facilitate the classification task. The experimental results show that our framework is effective in imbalanced data classification and is robust to feature descriptors by comparing it with four existing approaches using four different kinds of feature representations.",
keywords = "Classification, Correlation, Imbalanced data, Rare class mining, Skewed data",
author = "Yilin Yan and Yang Liu and Mei-Ling Shyu and Min Chen",
year = "2014",
month = "2",
day = "27",
doi = "10.1109/IRI.2014.7051939",
language = "English (US)",
isbn = "9781479958801",
pages = "561--568",
booktitle = "Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Utilizing concept correlations for effective imbalanced data classification

AU - Yan, Yilin

AU - Liu, Yang

AU - Shyu, Mei-Ling

AU - Chen, Min

PY - 2014/2/27

Y1 - 2014/2/27

N2 - Data imbalance is a challenging and common problem in data mining and machine learning areas, and has attracted significant research efforts. A data set is considered imbalanced when the data instances (samples) are not close to uniformly distributed across different classes/categories, which is very common in real-world data sets. It is likely to result in biased classification results. In this paper, a two-phase classification framework is proposed to make the classification of imbalanced data more accurate and stable. The proposed framework is based on the correlations generated between concepts. The general idea is to identify negative data instances which have certain positive correlations with data instances in the target concept to facilitate the classification task. The experimental results show that our framework is effective in imbalanced data classification and is robust to feature descriptors by comparing it with four existing approaches using four different kinds of feature representations.

AB - Data imbalance is a challenging and common problem in data mining and machine learning areas, and has attracted significant research efforts. A data set is considered imbalanced when the data instances (samples) are not close to uniformly distributed across different classes/categories, which is very common in real-world data sets. It is likely to result in biased classification results. In this paper, a two-phase classification framework is proposed to make the classification of imbalanced data more accurate and stable. The proposed framework is based on the correlations generated between concepts. The general idea is to identify negative data instances which have certain positive correlations with data instances in the target concept to facilitate the classification task. The experimental results show that our framework is effective in imbalanced data classification and is robust to feature descriptors by comparing it with four existing approaches using four different kinds of feature representations.

KW - Classification

KW - Correlation

KW - Imbalanced data

KW - Rare class mining

KW - Skewed data

UR - http://www.scopus.com/inward/record.url?scp=84946692997&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946692997&partnerID=8YFLogxK

U2 - 10.1109/IRI.2014.7051939

DO - 10.1109/IRI.2014.7051939

M3 - Conference contribution

SN - 9781479958801

SP - 561

EP - 568

BT - Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014

PB - Institute of Electrical and Electronics Engineers Inc.

ER -