Induction in multi-label text classification domains

Miroslav Kubat, Kanoksri Sarinnapakorn, Sareewan Dendamrongvit

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Automated classification of text documents has two distinctive aspects. First, each training or testing example can be labeled with more than two classes at the same time-this has serious consequences not only for the induction algorithms, but also for how we evaluate the performance of the induced classifier. Second, the examples are usually described by great many attributes, which makes induction from hundreds of thousands of training examples prohibitively expensive. Both issues have been addressed by recent machine-learning literature, but the behaviors of existing solutions in real-world domains are still far from satisfactory. Here, we describe our own technique and report experiments with a concrete text database.

Original languageEnglish (US)
Title of host publicationAdvances in Machine Learning II
Subtitle of host publicationDedicated to the Memory of Professor Ryszard S.Michalski
EditorsJacek Koronacki, Slawomir Wierzchon, Zbigniew Ras, Janusz Kacprzyk
Pages225-244
Number of pages20
DOIs
StatePublished - Jan 19 2010

Publication series

NameStudies in Computational Intelligence
Volume263
ISSN (Print)1860-949X

Keywords

  • Classifier induction
  • Dempster-Shafer theory
  • Information fusion
  • Multi-label examples
  • Text classification

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Induction in multi-label text classification domains'. Together they form a unique fingerprint.

Cite this