Induction from multi-label examples in information retrieval systems: A case study

Kanoksri Sarinnapakorn, Miroslav Kubat

Research output: Contribution to journalArticle

9 Scopus citations


Information retrieval systems often use machine-learning techniques to induce classifiers capable of categorizing documents. Unfortunately, the circumstance that the same document may simultaneously belong to two or more categories has so far received inadequate attention, and induction techniques currently in use often suffer from prohibitive computational costs. In the case study reported in this article, we managed to reduce these costs by running a "baseline induction algorithm" on the training examples described by diverse feature subsets, thus obtaining several subclassifiers. When asked about a document's classes, a "master classifier" combines the outputs of the subclassifiers. This combination can be accomplished in several different ways, but we achieved the best results with our own mechanism inspired by the Dempster-Shafer Theory (DST). We describe the technique, compare its performance (experimentally) with that of more traditional voting approaches, and show that its substantial computational savings were achieved in exchange for acceptable loss in classification performance.

Original languageEnglish (US)
Pages (from-to)407-432
Number of pages26
JournalApplied Artificial Intelligence
Issue number5
StatePublished - May 1 2008

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Induction from multi-label examples in information retrieval systems: A case study'. Together they form a unique fingerprint.

  • Cite this