Abstract
The paper focuses on automated categorization of text documents, each labeled with one or more classes and described by tens of thousands of features. The computational costs of induction in such domains are so high as almost to disqualify the use of decision trees; the reduction of these costs is thus an important research issue. Our own solution, FDT ("fast decision-tree induction"), uses a two-pronged strategy: (1) feature-set pre-selection, and (2) induction of several trees, each from a different data subset, with the combination of the results from multiple trees with a data-fusion technique tailored to domains with imbalanced classes.
Original language | English |
---|---|
Title of host publication | ICDM Workshops 2009 - IEEE International Conference on Data Mining |
Pages | 320-325 |
Number of pages | 6 |
DOIs | |
State | Published - Dec 1 2009 |
Event | 2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009 - Miami, FL, United States Duration: Dec 6 2009 → Dec 6 2009 |
Other
Other | 2009 IEEE International Conference on Data Mining Workshops, ICDMW 2009 |
---|---|
Country | United States |
City | Miami, FL |
Period | 12/6/09 → 12/6/09 |
Fingerprint
Keywords
- Decision tree
- Imbalanced classes
- Large-scale data
- Multi-label examples
- Text categorization
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Computer Vision and Pattern Recognition
- Software
Cite this
Fast induction of multiple decision trees in text categorization from large scale, imbalanced, and multi-label data. / Vateekul, Peerapon; Kubat, Miroslav.
ICDM Workshops 2009 - IEEE International Conference on Data Mining. 2009. p. 320-325 5360425.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - Fast induction of multiple decision trees in text categorization from large scale, imbalanced, and multi-label data
AU - Vateekul, Peerapon
AU - Kubat, Miroslav
PY - 2009/12/1
Y1 - 2009/12/1
N2 - The paper focuses on automated categorization of text documents, each labeled with one or more classes and described by tens of thousands of features. The computational costs of induction in such domains are so high as almost to disqualify the use of decision trees; the reduction of these costs is thus an important research issue. Our own solution, FDT ("fast decision-tree induction"), uses a two-pronged strategy: (1) feature-set pre-selection, and (2) induction of several trees, each from a different data subset, with the combination of the results from multiple trees with a data-fusion technique tailored to domains with imbalanced classes.
AB - The paper focuses on automated categorization of text documents, each labeled with one or more classes and described by tens of thousands of features. The computational costs of induction in such domains are so high as almost to disqualify the use of decision trees; the reduction of these costs is thus an important research issue. Our own solution, FDT ("fast decision-tree induction"), uses a two-pronged strategy: (1) feature-set pre-selection, and (2) induction of several trees, each from a different data subset, with the combination of the results from multiple trees with a data-fusion technique tailored to domains with imbalanced classes.
KW - Decision tree
KW - Imbalanced classes
KW - Large-scale data
KW - Multi-label examples
KW - Text categorization
UR - http://www.scopus.com/inward/record.url?scp=77951150128&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77951150128&partnerID=8YFLogxK
U2 - 10.1109/ICDMW.2009.94
DO - 10.1109/ICDMW.2009.94
M3 - Conference contribution
AN - SCOPUS:77951150128
SN - 9780769539027
SP - 320
EP - 325
BT - ICDM Workshops 2009 - IEEE International Conference on Data Mining
ER -