Combining subclassifiers in text categorization: A DST-based solution and a case study

Kanoksri Sarinnapakorn, Miroslav Kubat

Research output: Contribution to journalArticlepeer-review

51 Scopus citations


Text categorization systems often use machine learning techniques to induce document classifiers from preclassified examples. The fact that each example document belongs to many classes often leads to very high computational costs that sometimes grow exponentially In the number of features. Seeking to reduce these costs, we explored the possibility of running a "baseline induction algorithm" separately for subsets of features, obtaining a set of classifiers to be combined. For the specific case of classifiers that return not only class labels but also confidences in these labels, we investigate here a few alternative fusion techniques, including our own mechanism that was inspired by the Dempster-Shafer Theory. The paper describes the algorithm and, in our specific case study, compares its performance to that of more traditional mechanisms.

Original languageEnglish (US)
Pages (from-to)1638-1651
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Issue number12
StatePublished - Dec 2007


  • Dempster-Shafer Theory
  • Fusion
  • Machine learning
  • Multilabel examples
  • Text categorization

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence
  • Information Systems


Dive into the research topics of 'Combining subclassifiers in text categorization: A DST-based solution and a case study'. Together they form a unique fingerprint.

Cite this