Enhancing concept detection by pruning data with MCA-based transaction weights

Lin Lin, Mei-Ling Shyu, Shu Ching Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

With the rapid increase in the amount of multimedia data, the researches on semantic information retrieval are facing a very challenging problem - the number of positive data instances with the target concept/object/event compared with the number of negative data instances without the target concept/object/event is much smaller, which is also called the data imbalance issue. Therefore, one of the popular topics in multimedia information processing and retrieval is data pruning, a technique that can automatically identify and prune the data instances from the training data set so that the pruned data set is able to enhance the performance of model learning, classification, and concept detection. In this paper, a novel data pruning framework which gives each transaction a weight based on multiple correspondence analysis (MCA) is proposed. These transaction weights are used as the measure for pruning the training data set. Meanwhile, the testing data set could be weighted and pruned as well so that the computational cost is reduced not only when building the model but also when applying the classifiers. Experimenting with 18 high-level concepts and the benchmark (both balanced and imbalanced) data sets from TRECVID, our proposed framework achieves promising results to enhance the concept detection performance of three well-known classifiers commonly used for concept detection.

Original languageEnglish
Title of host publicationISM 2009 - 11th IEEE International Symposium on Multimedia
Pages304-311
Number of pages8
DOIs
StatePublished - Dec 1 2009
Event11th IEEE International Symposium on Multimedia, ISM 2009 - San Diego, CA, United States
Duration: Dec 14 2009Dec 16 2009

Other

Other11th IEEE International Symposium on Multimedia, ISM 2009
CountryUnited States
CitySan Diego, CA
Period12/14/0912/16/09

Fingerprint

Information retrieval
Classifiers
Semantics
Testing
Costs

Keywords

  • Concept detection
  • Data pruning
  • Multiple correspondence analysis
  • Transaction weight

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Hardware and Architecture
  • Software

Cite this

Lin, L., Shyu, M-L., & Chen, S. C. (2009). Enhancing concept detection by pruning data with MCA-based transaction weights. In ISM 2009 - 11th IEEE International Symposium on Multimedia (pp. 304-311). [5363259] https://doi.org/10.1109/ISM.2009.125

Enhancing concept detection by pruning data with MCA-based transaction weights. / Lin, Lin; Shyu, Mei-Ling; Chen, Shu Ching.

ISM 2009 - 11th IEEE International Symposium on Multimedia. 2009. p. 304-311 5363259.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lin, L, Shyu, M-L & Chen, SC 2009, Enhancing concept detection by pruning data with MCA-based transaction weights. in ISM 2009 - 11th IEEE International Symposium on Multimedia., 5363259, pp. 304-311, 11th IEEE International Symposium on Multimedia, ISM 2009, San Diego, CA, United States, 12/14/09. https://doi.org/10.1109/ISM.2009.125
Lin L, Shyu M-L, Chen SC. Enhancing concept detection by pruning data with MCA-based transaction weights. In ISM 2009 - 11th IEEE International Symposium on Multimedia. 2009. p. 304-311. 5363259 https://doi.org/10.1109/ISM.2009.125
Lin, Lin ; Shyu, Mei-Ling ; Chen, Shu Ching. / Enhancing concept detection by pruning data with MCA-based transaction weights. ISM 2009 - 11th IEEE International Symposium on Multimedia. 2009. pp. 304-311
@inproceedings{3b01ad7239154c9f86d46b1110a0aeb9,
title = "Enhancing concept detection by pruning data with MCA-based transaction weights",
abstract = "With the rapid increase in the amount of multimedia data, the researches on semantic information retrieval are facing a very challenging problem - the number of positive data instances with the target concept/object/event compared with the number of negative data instances without the target concept/object/event is much smaller, which is also called the data imbalance issue. Therefore, one of the popular topics in multimedia information processing and retrieval is data pruning, a technique that can automatically identify and prune the data instances from the training data set so that the pruned data set is able to enhance the performance of model learning, classification, and concept detection. In this paper, a novel data pruning framework which gives each transaction a weight based on multiple correspondence analysis (MCA) is proposed. These transaction weights are used as the measure for pruning the training data set. Meanwhile, the testing data set could be weighted and pruned as well so that the computational cost is reduced not only when building the model but also when applying the classifiers. Experimenting with 18 high-level concepts and the benchmark (both balanced and imbalanced) data sets from TRECVID, our proposed framework achieves promising results to enhance the concept detection performance of three well-known classifiers commonly used for concept detection.",
keywords = "Concept detection, Data pruning, Multiple correspondence analysis, Transaction weight",
author = "Lin Lin and Mei-Ling Shyu and Chen, {Shu Ching}",
year = "2009",
month = "12",
day = "1",
doi = "10.1109/ISM.2009.125",
language = "English",
isbn = "9780769538907",
pages = "304--311",
booktitle = "ISM 2009 - 11th IEEE International Symposium on Multimedia",

}

TY - GEN

T1 - Enhancing concept detection by pruning data with MCA-based transaction weights

AU - Lin, Lin

AU - Shyu, Mei-Ling

AU - Chen, Shu Ching

PY - 2009/12/1

Y1 - 2009/12/1

N2 - With the rapid increase in the amount of multimedia data, the researches on semantic information retrieval are facing a very challenging problem - the number of positive data instances with the target concept/object/event compared with the number of negative data instances without the target concept/object/event is much smaller, which is also called the data imbalance issue. Therefore, one of the popular topics in multimedia information processing and retrieval is data pruning, a technique that can automatically identify and prune the data instances from the training data set so that the pruned data set is able to enhance the performance of model learning, classification, and concept detection. In this paper, a novel data pruning framework which gives each transaction a weight based on multiple correspondence analysis (MCA) is proposed. These transaction weights are used as the measure for pruning the training data set. Meanwhile, the testing data set could be weighted and pruned as well so that the computational cost is reduced not only when building the model but also when applying the classifiers. Experimenting with 18 high-level concepts and the benchmark (both balanced and imbalanced) data sets from TRECVID, our proposed framework achieves promising results to enhance the concept detection performance of three well-known classifiers commonly used for concept detection.

AB - With the rapid increase in the amount of multimedia data, the researches on semantic information retrieval are facing a very challenging problem - the number of positive data instances with the target concept/object/event compared with the number of negative data instances without the target concept/object/event is much smaller, which is also called the data imbalance issue. Therefore, one of the popular topics in multimedia information processing and retrieval is data pruning, a technique that can automatically identify and prune the data instances from the training data set so that the pruned data set is able to enhance the performance of model learning, classification, and concept detection. In this paper, a novel data pruning framework which gives each transaction a weight based on multiple correspondence analysis (MCA) is proposed. These transaction weights are used as the measure for pruning the training data set. Meanwhile, the testing data set could be weighted and pruned as well so that the computational cost is reduced not only when building the model but also when applying the classifiers. Experimenting with 18 high-level concepts and the benchmark (both balanced and imbalanced) data sets from TRECVID, our proposed framework achieves promising results to enhance the concept detection performance of three well-known classifiers commonly used for concept detection.

KW - Concept detection

KW - Data pruning

KW - Multiple correspondence analysis

KW - Transaction weight

UR - http://www.scopus.com/inward/record.url?scp=77949504840&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77949504840&partnerID=8YFLogxK

U2 - 10.1109/ISM.2009.125

DO - 10.1109/ISM.2009.125

M3 - Conference contribution

SN - 9780769538907

SP - 304

EP - 311

BT - ISM 2009 - 11th IEEE International Symposium on Multimedia

ER -