Fast induction of multiple decision trees in text categorization from large scale, imbalanced, and multi-label data

Peerapon Vateekul, Miroslav Kubat

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

The paper focuses on automated categorization of text documents, each labeled with one or more classes and described by tens of thousands of features. The computational costs of induction in such domains are so high as almost to disqualify the use of decision trees; the reduction of these costs is thus an important research issue. Our own solution, FDT ("fast decision-tree induction"), uses a two-pronged strategy: (1) feature-set pre-selection, and (2) induction of several trees, each from a different data subset, with the combination of the results from multiple trees with a data-fusion technique tailored to domains with imbalanced classes.

Original languageEnglish
Title of host publicationICDM Workshops 2009 - IEEE International Conference on Data Mining
Pages320-325
Number of pages6
DOIs
StatePublished - Dec 1 2009
Event2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009 - Miami, FL, United States
Duration: Dec 6 2009Dec 6 2009

Other

Other2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009
CountryUnited States
CityMiami, FL
Period12/6/0912/6/09

Fingerprint

Decision trees
Labels
Data fusion
Costs

Keywords

  • Decision tree
  • Imbalanced classes
  • Large-scale data
  • Multi-label examples
  • Text categorization

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Vateekul, P., & Kubat, M. (2009). Fast induction of multiple decision trees in text categorization from large scale, imbalanced, and multi-label data. In ICDM Workshops 2009 - IEEE International Conference on Data Mining (pp. 320-325). [5360425] https://doi.org/10.1109/ICDMW.2009.94

Fast induction of multiple decision trees in text categorization from large scale, imbalanced, and multi-label data. / Vateekul, Peerapon; Kubat, Miroslav.

ICDM Workshops 2009 - IEEE International Conference on Data Mining. 2009. p. 320-325 5360425.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Vateekul, P & Kubat, M 2009, Fast induction of multiple decision trees in text categorization from large scale, imbalanced, and multi-label data. in ICDM Workshops 2009 - IEEE International Conference on Data Mining., 5360425, pp. 320-325, 2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009, Miami, FL, United States, 12/6/09. https://doi.org/10.1109/ICDMW.2009.94
Vateekul P, Kubat M. Fast induction of multiple decision trees in text categorization from large scale, imbalanced, and multi-label data. In ICDM Workshops 2009 - IEEE International Conference on Data Mining. 2009. p. 320-325. 5360425 https://doi.org/10.1109/ICDMW.2009.94
Vateekul, Peerapon ; Kubat, Miroslav. / Fast induction of multiple decision trees in text categorization from large scale, imbalanced, and multi-label data. ICDM Workshops 2009 - IEEE International Conference on Data Mining. 2009. pp. 320-325
@inproceedings{c3128f52b04d4ed5af04b7796ad701c2,
title = "Fast induction of multiple decision trees in text categorization from large scale, imbalanced, and multi-label data",
abstract = "The paper focuses on automated categorization of text documents, each labeled with one or more classes and described by tens of thousands of features. The computational costs of induction in such domains are so high as almost to disqualify the use of decision trees; the reduction of these costs is thus an important research issue. Our own solution, FDT ({"}fast decision-tree induction{"}), uses a two-pronged strategy: (1) feature-set pre-selection, and (2) induction of several trees, each from a different data subset, with the combination of the results from multiple trees with a data-fusion technique tailored to domains with imbalanced classes.",
keywords = "Decision tree, Imbalanced classes, Large-scale data, Multi-label examples, Text categorization",
author = "Peerapon Vateekul and Miroslav Kubat",
year = "2009",
month = "12",
day = "1",
doi = "10.1109/ICDMW.2009.94",
language = "English",
isbn = "9780769539027",
pages = "320--325",
booktitle = "ICDM Workshops 2009 - IEEE International Conference on Data Mining",

}

TY - GEN

T1 - Fast induction of multiple decision trees in text categorization from large scale, imbalanced, and multi-label data

AU - Vateekul, Peerapon

AU - Kubat, Miroslav

PY - 2009/12/1

Y1 - 2009/12/1

N2 - The paper focuses on automated categorization of text documents, each labeled with one or more classes and described by tens of thousands of features. The computational costs of induction in such domains are so high as almost to disqualify the use of decision trees; the reduction of these costs is thus an important research issue. Our own solution, FDT ("fast decision-tree induction"), uses a two-pronged strategy: (1) feature-set pre-selection, and (2) induction of several trees, each from a different data subset, with the combination of the results from multiple trees with a data-fusion technique tailored to domains with imbalanced classes.

AB - The paper focuses on automated categorization of text documents, each labeled with one or more classes and described by tens of thousands of features. The computational costs of induction in such domains are so high as almost to disqualify the use of decision trees; the reduction of these costs is thus an important research issue. Our own solution, FDT ("fast decision-tree induction"), uses a two-pronged strategy: (1) feature-set pre-selection, and (2) induction of several trees, each from a different data subset, with the combination of the results from multiple trees with a data-fusion technique tailored to domains with imbalanced classes.

KW - Decision tree

KW - Imbalanced classes

KW - Large-scale data

KW - Multi-label examples

KW - Text categorization

UR - http://www.scopus.com/inward/record.url?scp=77951150128&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77951150128&partnerID=8YFLogxK

U2 - 10.1109/ICDMW.2009.94

DO - 10.1109/ICDMW.2009.94

M3 - Conference contribution

AN - SCOPUS:77951150128

SN - 9780769539027

SP - 320

EP - 325

BT - ICDM Workshops 2009 - IEEE International Conference on Data Mining

ER -