Deep spatio-temporal representation learning for multi-class imbalanced data classification

Samira Pouyanfar, Shu Ching Chen, Mei-Ling Shyu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Deep learning, particularly Convolutional Neural Networks (CNNs), has significantly improved visual data processing. In recent years, video classification has attracted significant attention in the multimedia and deep learning community. It is one of the most challenging tasks since both visual and temporal information should be processed effectively. Existing techniques either disregard temporal information between video sequences or generate very complex and computationally expensive models to integrate the spatiotemporal data. In addition, most deep learning techniques do not automatically consider the data imbalance problem. This paper presents an effective deep learning framework for imbalanced video classification by utilizing both spatial and temporal information. This framework includes a spatiotemporal synthetic oversampling to handle data with a skewed distribution, a pre-trained CNN model for spatial sequence feature extraction, followed by a residual bidirectional Long Short Term Memory (LSTM) to capture temporal knowledge in video datasets. Experimental results on two imbalanced video datasets demonstrate the superiority of the proposed framework compared to the state-of-the-art approaches.

Original languageEnglish (US)
Title of host publicationProceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages386-393
Number of pages8
ISBN (Print)9781538626597
DOIs
StatePublished - Aug 2 2018
Event19th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2018 - Salt Lake City, United States
Duration: Jul 7 2018Jul 9 2018

Other

Other19th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2018
CountryUnited States
CitySalt Lake City
Period7/7/187/9/18

Fingerprint

video
learning
neural network
Neural networks
Feature extraction
multimedia
Deep learning
community

Keywords

  • CNN
  • Deep learning
  • LSTM
  • Multiclass imbalanced data
  • Spatio-temporal learning
  • Video classification

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Software
  • Artificial Intelligence
  • Information Systems and Management
  • Safety, Risk, Reliability and Quality
  • Public Administration

Cite this

Pouyanfar, S., Chen, S. C., & Shyu, M-L. (2018). Deep spatio-temporal representation learning for multi-class imbalanced data classification. In Proceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018 (pp. 386-393). [8424735] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IRI.2018.00064

Deep spatio-temporal representation learning for multi-class imbalanced data classification. / Pouyanfar, Samira; Chen, Shu Ching; Shyu, Mei-Ling.

Proceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018. Institute of Electrical and Electronics Engineers Inc., 2018. p. 386-393 8424735.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pouyanfar, S, Chen, SC & Shyu, M-L 2018, Deep spatio-temporal representation learning for multi-class imbalanced data classification. in Proceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018., 8424735, Institute of Electrical and Electronics Engineers Inc., pp. 386-393, 19th IEEE International Conference on Information Reuse and Integration for Data Science, IRI 2018, Salt Lake City, United States, 7/7/18. https://doi.org/10.1109/IRI.2018.00064
Pouyanfar S, Chen SC, Shyu M-L. Deep spatio-temporal representation learning for multi-class imbalanced data classification. In Proceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018. Institute of Electrical and Electronics Engineers Inc. 2018. p. 386-393. 8424735 https://doi.org/10.1109/IRI.2018.00064
Pouyanfar, Samira ; Chen, Shu Ching ; Shyu, Mei-Ling. / Deep spatio-temporal representation learning for multi-class imbalanced data classification. Proceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018. Institute of Electrical and Electronics Engineers Inc., 2018. pp. 386-393
@inproceedings{027ba1b33d97494983b6fc34a367c7a5,
title = "Deep spatio-temporal representation learning for multi-class imbalanced data classification",
abstract = "Deep learning, particularly Convolutional Neural Networks (CNNs), has significantly improved visual data processing. In recent years, video classification has attracted significant attention in the multimedia and deep learning community. It is one of the most challenging tasks since both visual and temporal information should be processed effectively. Existing techniques either disregard temporal information between video sequences or generate very complex and computationally expensive models to integrate the spatiotemporal data. In addition, most deep learning techniques do not automatically consider the data imbalance problem. This paper presents an effective deep learning framework for imbalanced video classification by utilizing both spatial and temporal information. This framework includes a spatiotemporal synthetic oversampling to handle data with a skewed distribution, a pre-trained CNN model for spatial sequence feature extraction, followed by a residual bidirectional Long Short Term Memory (LSTM) to capture temporal knowledge in video datasets. Experimental results on two imbalanced video datasets demonstrate the superiority of the proposed framework compared to the state-of-the-art approaches.",
keywords = "CNN, Deep learning, LSTM, Multiclass imbalanced data, Spatio-temporal learning, Video classification",
author = "Samira Pouyanfar and Chen, {Shu Ching} and Mei-Ling Shyu",
year = "2018",
month = "8",
day = "2",
doi = "10.1109/IRI.2018.00064",
language = "English (US)",
isbn = "9781538626597",
pages = "386--393",
booktitle = "Proceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Deep spatio-temporal representation learning for multi-class imbalanced data classification

AU - Pouyanfar, Samira

AU - Chen, Shu Ching

AU - Shyu, Mei-Ling

PY - 2018/8/2

Y1 - 2018/8/2

N2 - Deep learning, particularly Convolutional Neural Networks (CNNs), has significantly improved visual data processing. In recent years, video classification has attracted significant attention in the multimedia and deep learning community. It is one of the most challenging tasks since both visual and temporal information should be processed effectively. Existing techniques either disregard temporal information between video sequences or generate very complex and computationally expensive models to integrate the spatiotemporal data. In addition, most deep learning techniques do not automatically consider the data imbalance problem. This paper presents an effective deep learning framework for imbalanced video classification by utilizing both spatial and temporal information. This framework includes a spatiotemporal synthetic oversampling to handle data with a skewed distribution, a pre-trained CNN model for spatial sequence feature extraction, followed by a residual bidirectional Long Short Term Memory (LSTM) to capture temporal knowledge in video datasets. Experimental results on two imbalanced video datasets demonstrate the superiority of the proposed framework compared to the state-of-the-art approaches.

AB - Deep learning, particularly Convolutional Neural Networks (CNNs), has significantly improved visual data processing. In recent years, video classification has attracted significant attention in the multimedia and deep learning community. It is one of the most challenging tasks since both visual and temporal information should be processed effectively. Existing techniques either disregard temporal information between video sequences or generate very complex and computationally expensive models to integrate the spatiotemporal data. In addition, most deep learning techniques do not automatically consider the data imbalance problem. This paper presents an effective deep learning framework for imbalanced video classification by utilizing both spatial and temporal information. This framework includes a spatiotemporal synthetic oversampling to handle data with a skewed distribution, a pre-trained CNN model for spatial sequence feature extraction, followed by a residual bidirectional Long Short Term Memory (LSTM) to capture temporal knowledge in video datasets. Experimental results on two imbalanced video datasets demonstrate the superiority of the proposed framework compared to the state-of-the-art approaches.

KW - CNN

KW - Deep learning

KW - LSTM

KW - Multiclass imbalanced data

KW - Spatio-temporal learning

KW - Video classification

UR - http://www.scopus.com/inward/record.url?scp=85052316877&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85052316877&partnerID=8YFLogxK

U2 - 10.1109/IRI.2018.00064

DO - 10.1109/IRI.2018.00064

M3 - Conference contribution

AN - SCOPUS:85052316877

SN - 9781538626597

SP - 386

EP - 393

BT - Proceedings - 2018 IEEE 19th International Conference on Information Reuse and Integration for Data Science, IRI 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -