Induction from multi-label examples in information retrieval systems: A case study

Kanoksri Sarinnapakorn, Miroslav Kubat

Research output: Contribution to journalArticle

9 Scopus citations

Abstract

Information retrieval systems often use machine-learning techniques to induce classifiers capable of categorizing documents. Unfortunately, the circumstance that the same document may simultaneously belong to two or more categories has so far received inadequate attention, and induction techniques currently in use often suffer from prohibitive computational costs. In the case study reported in this article, we managed to reduce these costs by running a "baseline induction algorithm" on the training examples described by diverse feature subsets, thus obtaining several subclassifiers. When asked about a document's classes, a "master classifier" combines the outputs of the subclassifiers. This combination can be accomplished in several different ways, but we achieved the best results with our own mechanism inspired by the Dempster-Shafer Theory (DST). We describe the technique, compare its performance (experimentally) with that of more traditional voting approaches, and show that its substantial computational savings were achieved in exchange for acceptable loss in classification performance.

Original languageEnglish (US)
Pages (from-to)407-432
Number of pages26
JournalApplied Artificial Intelligence
Volume22
Issue number5
DOIs
StatePublished - May 1 2008

    Fingerprint

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this