Effective supervised discretization for classification based on correlation maximization

Qiusha Zhu, Lin Lin, Mei-Ling Shyu, Shu Ching Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

26 Citations (Scopus)

Abstract

In many real-world applications, there are features (or attributes) that are continuous or numerical in the data. However, many classification models only take nominal features as the inputs. Therefore, it is necessary to apply discretization as a pre-processing step to transform numerical data into nominal data for such models. Well-discretized data should not only characterize the original data to produce a concise summarization, but also improve the classification performance. In this paper, a novel and effective supervised discretization algorithm based on correlation maximization (CM) is proposed by using multiple correspondence analysis (MCA) which is a technique to capture the correlations between multiple variables. For each numeric feature, the correlation information generated from MCA is used to build the discretization algorithm that maximizes the correlations between feature intervals/items and classes. Empirical comparisons with four other commonly used discretization algorithms are conducted using six well-known classifiers. Results on five UCI datasets and five TRECVID datasets demonstrate that our proposed discretization algorithm can automatically generate a better set of features (feature intervals) by maximizing their correlations with the classes and thus improve the classification performance.

Original languageEnglish
Title of host publicationProceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011
Pages390-395
Number of pages6
DOIs
StatePublished - Sep 29 2011
Event12th IEEE International Conference on Information Reuse and Integration, IRI 2011 - Las Vegas, NV, United States
Duration: Aug 3 2011Aug 5 2011

Other

Other12th IEEE International Conference on Information Reuse and Integration, IRI 2011
CountryUnited States
CityLas Vegas, NV
Period8/3/118/5/11

Fingerprint

Classifiers
Discretization
Processing
Correspondence analysis
Summarization
Classifier

Keywords

  • Classification
  • Correlation
  • Multiple Correspondence Analysis (MCA)
  • Supervised Discretization

ASJC Scopus subject areas

  • Information Systems
  • Information Systems and Management

Cite this

Zhu, Q., Lin, L., Shyu, M-L., & Chen, S. C. (2011). Effective supervised discretization for classification based on correlation maximization. In Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011 (pp. 390-395). [6009579] https://doi.org/10.1109/IRI.2011.6009579

Effective supervised discretization for classification based on correlation maximization. / Zhu, Qiusha; Lin, Lin; Shyu, Mei-Ling; Chen, Shu Ching.

Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011. 2011. p. 390-395 6009579.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zhu, Q, Lin, L, Shyu, M-L & Chen, SC 2011, Effective supervised discretization for classification based on correlation maximization. in Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011., 6009579, pp. 390-395, 12th IEEE International Conference on Information Reuse and Integration, IRI 2011, Las Vegas, NV, United States, 8/3/11. https://doi.org/10.1109/IRI.2011.6009579
Zhu Q, Lin L, Shyu M-L, Chen SC. Effective supervised discretization for classification based on correlation maximization. In Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011. 2011. p. 390-395. 6009579 https://doi.org/10.1109/IRI.2011.6009579
Zhu, Qiusha ; Lin, Lin ; Shyu, Mei-Ling ; Chen, Shu Ching. / Effective supervised discretization for classification based on correlation maximization. Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011. 2011. pp. 390-395
@inproceedings{f314a6f9a8bc4c20bbb25398d362873b,
title = "Effective supervised discretization for classification based on correlation maximization",
abstract = "In many real-world applications, there are features (or attributes) that are continuous or numerical in the data. However, many classification models only take nominal features as the inputs. Therefore, it is necessary to apply discretization as a pre-processing step to transform numerical data into nominal data for such models. Well-discretized data should not only characterize the original data to produce a concise summarization, but also improve the classification performance. In this paper, a novel and effective supervised discretization algorithm based on correlation maximization (CM) is proposed by using multiple correspondence analysis (MCA) which is a technique to capture the correlations between multiple variables. For each numeric feature, the correlation information generated from MCA is used to build the discretization algorithm that maximizes the correlations between feature intervals/items and classes. Empirical comparisons with four other commonly used discretization algorithms are conducted using six well-known classifiers. Results on five UCI datasets and five TRECVID datasets demonstrate that our proposed discretization algorithm can automatically generate a better set of features (feature intervals) by maximizing their correlations with the classes and thus improve the classification performance.",
keywords = "Classification, Correlation, Multiple Correspondence Analysis (MCA), Supervised Discretization",
author = "Qiusha Zhu and Lin Lin and Mei-Ling Shyu and Chen, {Shu Ching}",
year = "2011",
month = "9",
day = "29",
doi = "10.1109/IRI.2011.6009579",
language = "English",
isbn = "9781457709661",
pages = "390--395",
booktitle = "Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011",

}

TY - GEN

T1 - Effective supervised discretization for classification based on correlation maximization

AU - Zhu, Qiusha

AU - Lin, Lin

AU - Shyu, Mei-Ling

AU - Chen, Shu Ching

PY - 2011/9/29

Y1 - 2011/9/29

N2 - In many real-world applications, there are features (or attributes) that are continuous or numerical in the data. However, many classification models only take nominal features as the inputs. Therefore, it is necessary to apply discretization as a pre-processing step to transform numerical data into nominal data for such models. Well-discretized data should not only characterize the original data to produce a concise summarization, but also improve the classification performance. In this paper, a novel and effective supervised discretization algorithm based on correlation maximization (CM) is proposed by using multiple correspondence analysis (MCA) which is a technique to capture the correlations between multiple variables. For each numeric feature, the correlation information generated from MCA is used to build the discretization algorithm that maximizes the correlations between feature intervals/items and classes. Empirical comparisons with four other commonly used discretization algorithms are conducted using six well-known classifiers. Results on five UCI datasets and five TRECVID datasets demonstrate that our proposed discretization algorithm can automatically generate a better set of features (feature intervals) by maximizing their correlations with the classes and thus improve the classification performance.

AB - In many real-world applications, there are features (or attributes) that are continuous or numerical in the data. However, many classification models only take nominal features as the inputs. Therefore, it is necessary to apply discretization as a pre-processing step to transform numerical data into nominal data for such models. Well-discretized data should not only characterize the original data to produce a concise summarization, but also improve the classification performance. In this paper, a novel and effective supervised discretization algorithm based on correlation maximization (CM) is proposed by using multiple correspondence analysis (MCA) which is a technique to capture the correlations between multiple variables. For each numeric feature, the correlation information generated from MCA is used to build the discretization algorithm that maximizes the correlations between feature intervals/items and classes. Empirical comparisons with four other commonly used discretization algorithms are conducted using six well-known classifiers. Results on five UCI datasets and five TRECVID datasets demonstrate that our proposed discretization algorithm can automatically generate a better set of features (feature intervals) by maximizing their correlations with the classes and thus improve the classification performance.

KW - Classification

KW - Correlation

KW - Multiple Correspondence Analysis (MCA)

KW - Supervised Discretization

UR - http://www.scopus.com/inward/record.url?scp=80053163315&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053163315&partnerID=8YFLogxK

U2 - 10.1109/IRI.2011.6009579

DO - 10.1109/IRI.2011.6009579

M3 - Conference contribution

SN - 9781457709661

SP - 390

EP - 395

BT - Proceedings of the 2011 IEEE International Conference on Information Reuse and Integration, IRI 2011

ER -