Deep Learning with MCA-based Instance Selection and Bootstrapping for Imbalanced Data Classification

Sheng Guan, Min Chen, Hsin Yu Ha, Shu Ching Chen, Mei-Ling Shyu, Chengde Zhang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

In this paper, we propose an extended deep learning approach that incorporates instance selection and bootstrapping techniques for imbalanced data classification. In supervised learning, classification performance often deteriorates when the training set is imbalanced where at least one of the classes has a substantially fewer number of instances than the others. We propose to use adaptive synthetic sampling approach (ADASYN) to generate synthetic instances for the minority class. A data pruning process based on multiple correspondence analysis (MCA) is then performed to identify a sub-set of synthetic instances that are most suitable to supplement the existing minority instances. This results in a relatively more balanced training dataset which is then bootstrapped and fed into the convolutional neural networks (CNNs) for classification. Furthermore, we propose to use low-level features pre-processed by principal component analysis (PCA), instead of the commonly used raw signal data, as the input to CNNs to reduce the computational time. The experimental results show the effectiveness of our framework in classifying 54 TRECVID concepts with different imbalanced levels by comparing with other state-of-the-art methods.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE Conference on Collaboration and Internet Computing, CIC 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages288-295
Number of pages8
ISBN (Print)9781509000890
DOIs
StatePublished - Mar 1 2016
Event1st IEEE International Conference on Collaboration and Internet Computing, CIC 2015 - Hangzhou, China
Duration: Oct 28 2015Oct 30 2015

Other

Other1st IEEE International Conference on Collaboration and Internet Computing, CIC 2015
CountryChina
CityHangzhou
Period10/28/1510/30/15

Fingerprint

Neural networks
Supervised learning
Principal component analysis
Sampling
Deep learning

Keywords

  • Bootstrapping
  • Classification
  • Convolutional neural network (CNN)
  • Imbalanced data
  • Multiple correspondence analysis (MCA)
  • Supervised learning

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications

Cite this

Guan, S., Chen, M., Ha, H. Y., Chen, S. C., Shyu, M-L., & Zhang, C. (2016). Deep Learning with MCA-based Instance Selection and Bootstrapping for Imbalanced Data Classification. In Proceedings - 2015 IEEE Conference on Collaboration and Internet Computing, CIC 2015 (pp. 288-295). [7423094] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/CIC.2015.40

Deep Learning with MCA-based Instance Selection and Bootstrapping for Imbalanced Data Classification. / Guan, Sheng; Chen, Min; Ha, Hsin Yu; Chen, Shu Ching; Shyu, Mei-Ling; Zhang, Chengde.

Proceedings - 2015 IEEE Conference on Collaboration and Internet Computing, CIC 2015. Institute of Electrical and Electronics Engineers Inc., 2016. p. 288-295 7423094.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Guan, S, Chen, M, Ha, HY, Chen, SC, Shyu, M-L & Zhang, C 2016, Deep Learning with MCA-based Instance Selection and Bootstrapping for Imbalanced Data Classification. in Proceedings - 2015 IEEE Conference on Collaboration and Internet Computing, CIC 2015., 7423094, Institute of Electrical and Electronics Engineers Inc., pp. 288-295, 1st IEEE International Conference on Collaboration and Internet Computing, CIC 2015, Hangzhou, China, 10/28/15. https://doi.org/10.1109/CIC.2015.40
Guan S, Chen M, Ha HY, Chen SC, Shyu M-L, Zhang C. Deep Learning with MCA-based Instance Selection and Bootstrapping for Imbalanced Data Classification. In Proceedings - 2015 IEEE Conference on Collaboration and Internet Computing, CIC 2015. Institute of Electrical and Electronics Engineers Inc. 2016. p. 288-295. 7423094 https://doi.org/10.1109/CIC.2015.40
Guan, Sheng ; Chen, Min ; Ha, Hsin Yu ; Chen, Shu Ching ; Shyu, Mei-Ling ; Zhang, Chengde. / Deep Learning with MCA-based Instance Selection and Bootstrapping for Imbalanced Data Classification. Proceedings - 2015 IEEE Conference on Collaboration and Internet Computing, CIC 2015. Institute of Electrical and Electronics Engineers Inc., 2016. pp. 288-295
@inproceedings{8e71170bc1624a99ac15fd1d3b7f1a30,
title = "Deep Learning with MCA-based Instance Selection and Bootstrapping for Imbalanced Data Classification",
abstract = "In this paper, we propose an extended deep learning approach that incorporates instance selection and bootstrapping techniques for imbalanced data classification. In supervised learning, classification performance often deteriorates when the training set is imbalanced where at least one of the classes has a substantially fewer number of instances than the others. We propose to use adaptive synthetic sampling approach (ADASYN) to generate synthetic instances for the minority class. A data pruning process based on multiple correspondence analysis (MCA) is then performed to identify a sub-set of synthetic instances that are most suitable to supplement the existing minority instances. This results in a relatively more balanced training dataset which is then bootstrapped and fed into the convolutional neural networks (CNNs) for classification. Furthermore, we propose to use low-level features pre-processed by principal component analysis (PCA), instead of the commonly used raw signal data, as the input to CNNs to reduce the computational time. The experimental results show the effectiveness of our framework in classifying 54 TRECVID concepts with different imbalanced levels by comparing with other state-of-the-art methods.",
keywords = "Bootstrapping, Classification, Convolutional neural network (CNN), Imbalanced data, Multiple correspondence analysis (MCA), Supervised learning",
author = "Sheng Guan and Min Chen and Ha, {Hsin Yu} and Chen, {Shu Ching} and Mei-Ling Shyu and Chengde Zhang",
year = "2016",
month = "3",
day = "1",
doi = "10.1109/CIC.2015.40",
language = "English (US)",
isbn = "9781509000890",
pages = "288--295",
booktitle = "Proceedings - 2015 IEEE Conference on Collaboration and Internet Computing, CIC 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Deep Learning with MCA-based Instance Selection and Bootstrapping for Imbalanced Data Classification

AU - Guan, Sheng

AU - Chen, Min

AU - Ha, Hsin Yu

AU - Chen, Shu Ching

AU - Shyu, Mei-Ling

AU - Zhang, Chengde

PY - 2016/3/1

Y1 - 2016/3/1

N2 - In this paper, we propose an extended deep learning approach that incorporates instance selection and bootstrapping techniques for imbalanced data classification. In supervised learning, classification performance often deteriorates when the training set is imbalanced where at least one of the classes has a substantially fewer number of instances than the others. We propose to use adaptive synthetic sampling approach (ADASYN) to generate synthetic instances for the minority class. A data pruning process based on multiple correspondence analysis (MCA) is then performed to identify a sub-set of synthetic instances that are most suitable to supplement the existing minority instances. This results in a relatively more balanced training dataset which is then bootstrapped and fed into the convolutional neural networks (CNNs) for classification. Furthermore, we propose to use low-level features pre-processed by principal component analysis (PCA), instead of the commonly used raw signal data, as the input to CNNs to reduce the computational time. The experimental results show the effectiveness of our framework in classifying 54 TRECVID concepts with different imbalanced levels by comparing with other state-of-the-art methods.

AB - In this paper, we propose an extended deep learning approach that incorporates instance selection and bootstrapping techniques for imbalanced data classification. In supervised learning, classification performance often deteriorates when the training set is imbalanced where at least one of the classes has a substantially fewer number of instances than the others. We propose to use adaptive synthetic sampling approach (ADASYN) to generate synthetic instances for the minority class. A data pruning process based on multiple correspondence analysis (MCA) is then performed to identify a sub-set of synthetic instances that are most suitable to supplement the existing minority instances. This results in a relatively more balanced training dataset which is then bootstrapped and fed into the convolutional neural networks (CNNs) for classification. Furthermore, we propose to use low-level features pre-processed by principal component analysis (PCA), instead of the commonly used raw signal data, as the input to CNNs to reduce the computational time. The experimental results show the effectiveness of our framework in classifying 54 TRECVID concepts with different imbalanced levels by comparing with other state-of-the-art methods.

KW - Bootstrapping

KW - Classification

KW - Convolutional neural network (CNN)

KW - Imbalanced data

KW - Multiple correspondence analysis (MCA)

KW - Supervised learning

UR - http://www.scopus.com/inward/record.url?scp=84964895866&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964895866&partnerID=8YFLogxK

U2 - 10.1109/CIC.2015.40

DO - 10.1109/CIC.2015.40

M3 - Conference contribution

SN - 9781509000890

SP - 288

EP - 295

BT - Proceedings - 2015 IEEE Conference on Collaboration and Internet Computing, CIC 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -