Integration of semantics information and clustering in binary-class classification for handling imbalanced multimedia data

Chao Chen, Mei-Ling Shyu

Research output: Chapter in Book/Report/Conference proceedingChapter

2 Citations (Scopus)

Abstract

It is well-acknowledged that the data imbalance issue is one of the major challenges in classification, i.e., when the ratio of the positive data instances to the negative data instances is very small, especially for multimedia data. One solution is to utilize the clustering technique in binary-class classification to partition the majority class (also called negative class) into several subsets, each of which merges with the minority class (also called positive class) to form a much more balanced subset of the original data set. However, one major drawback of clustering is its time-consuming process to construct each cluster. Due to the fact that there are rich semantics in multimedia data (such as video and image data), the utilization of video semantics (i.e., semantic concepts as class labels) to form negative subsets can (i) effectively construct several groups whose data instances are semantically related, and (ii) significantly reduce the number of data instances participating in the clustering step. Therefore, in this chapter, a novel binary-class classification framework that integrates the video semantics information and the clustering technique is proposed to address the data imbalance issue. Experiments are conducted to compare our proposed framework with other techniques that are commonly used to learn from imbalanced data sets. The experimental results on some highly imbalanced video data sets demonstrate that our proposed classification framework outperforms these comparative classification approaches about 3-16 %.

Original languageEnglish (US)
Title of host publicationInformation Reuse and Integration in Academia and Industry
PublisherSpringer-Verlag Wien
Pages281-298
Number of pages18
Volume9783709115381
ISBN (Print)9783709115381, 370911537X, 9783709115374
DOIs
StatePublished - Nov 1 2013

Fingerprint

Semantics
Labels
Experiments

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Chen, C., & Shyu, M-L. (2013). Integration of semantics information and clustering in binary-class classification for handling imbalanced multimedia data. In Information Reuse and Integration in Academia and Industry (Vol. 9783709115381, pp. 281-298). Springer-Verlag Wien. https://doi.org/10.1007/978-3-7091-1538-1_14

Integration of semantics information and clustering in binary-class classification for handling imbalanced multimedia data. / Chen, Chao; Shyu, Mei-Ling.

Information Reuse and Integration in Academia and Industry. Vol. 9783709115381 Springer-Verlag Wien, 2013. p. 281-298.

Research output: Chapter in Book/Report/Conference proceedingChapter

Chen, C & Shyu, M-L 2013, Integration of semantics information and clustering in binary-class classification for handling imbalanced multimedia data. in Information Reuse and Integration in Academia and Industry. vol. 9783709115381, Springer-Verlag Wien, pp. 281-298. https://doi.org/10.1007/978-3-7091-1538-1_14
Chen C, Shyu M-L. Integration of semantics information and clustering in binary-class classification for handling imbalanced multimedia data. In Information Reuse and Integration in Academia and Industry. Vol. 9783709115381. Springer-Verlag Wien. 2013. p. 281-298 https://doi.org/10.1007/978-3-7091-1538-1_14
Chen, Chao ; Shyu, Mei-Ling. / Integration of semantics information and clustering in binary-class classification for handling imbalanced multimedia data. Information Reuse and Integration in Academia and Industry. Vol. 9783709115381 Springer-Verlag Wien, 2013. pp. 281-298
@inbook{a4a61945f3e74110905ea8c382ed0cde,
title = "Integration of semantics information and clustering in binary-class classification for handling imbalanced multimedia data",
abstract = "It is well-acknowledged that the data imbalance issue is one of the major challenges in classification, i.e., when the ratio of the positive data instances to the negative data instances is very small, especially for multimedia data. One solution is to utilize the clustering technique in binary-class classification to partition the majority class (also called negative class) into several subsets, each of which merges with the minority class (also called positive class) to form a much more balanced subset of the original data set. However, one major drawback of clustering is its time-consuming process to construct each cluster. Due to the fact that there are rich semantics in multimedia data (such as video and image data), the utilization of video semantics (i.e., semantic concepts as class labels) to form negative subsets can (i) effectively construct several groups whose data instances are semantically related, and (ii) significantly reduce the number of data instances participating in the clustering step. Therefore, in this chapter, a novel binary-class classification framework that integrates the video semantics information and the clustering technique is proposed to address the data imbalance issue. Experiments are conducted to compare our proposed framework with other techniques that are commonly used to learn from imbalanced data sets. The experimental results on some highly imbalanced video data sets demonstrate that our proposed classification framework outperforms these comparative classification approaches about 3-16 {\%}.",
author = "Chao Chen and Mei-Ling Shyu",
year = "2013",
month = "11",
day = "1",
doi = "10.1007/978-3-7091-1538-1_14",
language = "English (US)",
isbn = "9783709115381",
volume = "9783709115381",
pages = "281--298",
booktitle = "Information Reuse and Integration in Academia and Industry",
publisher = "Springer-Verlag Wien",

}

TY - CHAP

T1 - Integration of semantics information and clustering in binary-class classification for handling imbalanced multimedia data

AU - Chen, Chao

AU - Shyu, Mei-Ling

PY - 2013/11/1

Y1 - 2013/11/1

N2 - It is well-acknowledged that the data imbalance issue is one of the major challenges in classification, i.e., when the ratio of the positive data instances to the negative data instances is very small, especially for multimedia data. One solution is to utilize the clustering technique in binary-class classification to partition the majority class (also called negative class) into several subsets, each of which merges with the minority class (also called positive class) to form a much more balanced subset of the original data set. However, one major drawback of clustering is its time-consuming process to construct each cluster. Due to the fact that there are rich semantics in multimedia data (such as video and image data), the utilization of video semantics (i.e., semantic concepts as class labels) to form negative subsets can (i) effectively construct several groups whose data instances are semantically related, and (ii) significantly reduce the number of data instances participating in the clustering step. Therefore, in this chapter, a novel binary-class classification framework that integrates the video semantics information and the clustering technique is proposed to address the data imbalance issue. Experiments are conducted to compare our proposed framework with other techniques that are commonly used to learn from imbalanced data sets. The experimental results on some highly imbalanced video data sets demonstrate that our proposed classification framework outperforms these comparative classification approaches about 3-16 %.

AB - It is well-acknowledged that the data imbalance issue is one of the major challenges in classification, i.e., when the ratio of the positive data instances to the negative data instances is very small, especially for multimedia data. One solution is to utilize the clustering technique in binary-class classification to partition the majority class (also called negative class) into several subsets, each of which merges with the minority class (also called positive class) to form a much more balanced subset of the original data set. However, one major drawback of clustering is its time-consuming process to construct each cluster. Due to the fact that there are rich semantics in multimedia data (such as video and image data), the utilization of video semantics (i.e., semantic concepts as class labels) to form negative subsets can (i) effectively construct several groups whose data instances are semantically related, and (ii) significantly reduce the number of data instances participating in the clustering step. Therefore, in this chapter, a novel binary-class classification framework that integrates the video semantics information and the clustering technique is proposed to address the data imbalance issue. Experiments are conducted to compare our proposed framework with other techniques that are commonly used to learn from imbalanced data sets. The experimental results on some highly imbalanced video data sets demonstrate that our proposed classification framework outperforms these comparative classification approaches about 3-16 %.

UR - http://www.scopus.com/inward/record.url?scp=84930788841&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84930788841&partnerID=8YFLogxK

U2 - 10.1007/978-3-7091-1538-1_14

DO - 10.1007/978-3-7091-1538-1_14

M3 - Chapter

SN - 9783709115381

SN - 370911537X

SN - 9783709115374

VL - 9783709115381

SP - 281

EP - 298

BT - Information Reuse and Integration in Academia and Industry

PB - Springer-Verlag Wien

ER -