Integration of semantics information and clustering in binary-class classification for handling imbalanced multimedia data

Chao Chen, Mei Ling Shyu

Research output: Chapter in Book/Report/Conference proceedingChapter

2 Scopus citations


It is well-acknowledged that the data imbalance issue is one of the major challenges in classification, i.e., when the ratio of the positive data instances to the negative data instances is very small, especially for multimedia data. One solution is to utilize the clustering technique in binary-class classification to partition the majority class (also called negative class) into several subsets, each of which merges with the minority class (also called positive class) to form a much more balanced subset of the original data set. However, one major drawback of clustering is its time-consuming process to construct each cluster. Due to the fact that there are rich semantics in multimedia data (such as video and image data), the utilization of video semantics (i.e., semantic concepts as class labels) to form negative subsets can (i) effectively construct several groups whose data instances are semantically related, and (ii) significantly reduce the number of data instances participating in the clustering step. Therefore, in this chapter, a novel binary-class classification framework that integrates the video semantics information and the clustering technique is proposed to address the data imbalance issue. Experiments are conducted to compare our proposed framework with other techniques that are commonly used to learn from imbalanced data sets. The experimental results on some highly imbalanced video data sets demonstrate that our proposed classification framework outperforms these comparative classification approaches about 3-16 %.

Original languageEnglish (US)
Title of host publicationInformation Reuse and Integration in Academia and Industry
PublisherSpringer-Verlag Wien
Number of pages18
ISBN (Electronic)9783709115381
ISBN (Print)370911537X, 9783709115374
StatePublished - Nov 1 2013

ASJC Scopus subject areas

  • Computer Science(all)


Dive into the research topics of 'Integration of semantics information and clustering in binary-class classification for handling imbalanced multimedia data'. Together they form a unique fingerprint.

Cite this