Induction from multi-label examples in information retrieval systems

A case study

Kanoksri Sarinnapakorn, Miroslav Kubat

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Information retrieval systems often use machine-learning techniques to induce classifiers capable of categorizing documents. Unfortunately, the circumstance that the same document may simultaneously belong to two or more categories has so far received inadequate attention, and induction techniques currently in use often suffer from prohibitive computational costs. In the case study reported in this article, we managed to reduce these costs by running a "baseline induction algorithm" on the training examples described by diverse feature subsets, thus obtaining several subclassifiers. When asked about a document's classes, a "master classifier" combines the outputs of the subclassifiers. This combination can be accomplished in several different ways, but we achieved the best results with our own mechanism inspired by the Dempster-Shafer Theory (DST). We describe the technique, compare its performance (experimentally) with that of more traditional voting approaches, and show that its substantial computational savings were achieved in exchange for acceptable loss in classification performance.

Original languageEnglish
Pages (from-to)407-432
Number of pages26
JournalApplied Artificial Intelligence
Volume22
Issue number5
DOIs
StatePublished - May 1 2008

Fingerprint

Information retrieval systems
Labels
Classifiers
Learning systems
Costs

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Artificial Intelligence

Cite this

Induction from multi-label examples in information retrieval systems : A case study. / Sarinnapakorn, Kanoksri; Kubat, Miroslav.

In: Applied Artificial Intelligence, Vol. 22, No. 5, 01.05.2008, p. 407-432.

Research output: Contribution to journalArticle

@article{ee231b5e87cf46dfad7c37367693f8c7,
title = "Induction from multi-label examples in information retrieval systems: A case study",
abstract = "Information retrieval systems often use machine-learning techniques to induce classifiers capable of categorizing documents. Unfortunately, the circumstance that the same document may simultaneously belong to two or more categories has so far received inadequate attention, and induction techniques currently in use often suffer from prohibitive computational costs. In the case study reported in this article, we managed to reduce these costs by running a {"}baseline induction algorithm{"} on the training examples described by diverse feature subsets, thus obtaining several subclassifiers. When asked about a document's classes, a {"}master classifier{"} combines the outputs of the subclassifiers. This combination can be accomplished in several different ways, but we achieved the best results with our own mechanism inspired by the Dempster-Shafer Theory (DST). We describe the technique, compare its performance (experimentally) with that of more traditional voting approaches, and show that its substantial computational savings were achieved in exchange for acceptable loss in classification performance.",
author = "Kanoksri Sarinnapakorn and Miroslav Kubat",
year = "2008",
month = "5",
day = "1",
doi = "10.1080/08839510801972827",
language = "English",
volume = "22",
pages = "407--432",
journal = "Applied Artificial Intelligence",
issn = "0883-9514",
publisher = "Taylor and Francis Ltd.",
number = "5",

}

TY - JOUR

T1 - Induction from multi-label examples in information retrieval systems

T2 - A case study

AU - Sarinnapakorn, Kanoksri

AU - Kubat, Miroslav

PY - 2008/5/1

Y1 - 2008/5/1

N2 - Information retrieval systems often use machine-learning techniques to induce classifiers capable of categorizing documents. Unfortunately, the circumstance that the same document may simultaneously belong to two or more categories has so far received inadequate attention, and induction techniques currently in use often suffer from prohibitive computational costs. In the case study reported in this article, we managed to reduce these costs by running a "baseline induction algorithm" on the training examples described by diverse feature subsets, thus obtaining several subclassifiers. When asked about a document's classes, a "master classifier" combines the outputs of the subclassifiers. This combination can be accomplished in several different ways, but we achieved the best results with our own mechanism inspired by the Dempster-Shafer Theory (DST). We describe the technique, compare its performance (experimentally) with that of more traditional voting approaches, and show that its substantial computational savings were achieved in exchange for acceptable loss in classification performance.

AB - Information retrieval systems often use machine-learning techniques to induce classifiers capable of categorizing documents. Unfortunately, the circumstance that the same document may simultaneously belong to two or more categories has so far received inadequate attention, and induction techniques currently in use often suffer from prohibitive computational costs. In the case study reported in this article, we managed to reduce these costs by running a "baseline induction algorithm" on the training examples described by diverse feature subsets, thus obtaining several subclassifiers. When asked about a document's classes, a "master classifier" combines the outputs of the subclassifiers. This combination can be accomplished in several different ways, but we achieved the best results with our own mechanism inspired by the Dempster-Shafer Theory (DST). We describe the technique, compare its performance (experimentally) with that of more traditional voting approaches, and show that its substantial computational savings were achieved in exchange for acceptable loss in classification performance.

UR - http://www.scopus.com/inward/record.url?scp=46149105706&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=46149105706&partnerID=8YFLogxK

U2 - 10.1080/08839510801972827

DO - 10.1080/08839510801972827

M3 - Article

VL - 22

SP - 407

EP - 432

JO - Applied Artificial Intelligence

JF - Applied Artificial Intelligence

SN - 0883-9514

IS - 5

ER -