Abstract
The task of text categorization is to assign one or more classes to a document. The simplest machine learning approach to such domains, simply induces a binary classifier separately for each class, and then uses these classifiers in parallel. An example of motivating application is a digital library collection that used to be classified into classes and sub-classes in a hierarchical order. Another important issue that we are considering is the document might belong to more than one class, in this case we will be working on a high performance multi-class label classifier. The study we are intending to do herein is going to show how much we can gain from machine learning. This mean, if we need something like 10 to 15% of the data for training, and testing or do we need > 50% of the data set for training and testing. In the latter case, the machine learning may don't contribute that much. However, if 10 to 15% of the data set is needed, then, machine learning has a great contribution. The last issue we are working on in this research is the inter-class relation. Which means, if the example is classified to belong to a class C, does this mean, the example belong to parents and grandparents classes of the class C, and on the opposite way too? We will use a framework to classify documents automatically and this can indeed answer these questions.
Original language | English |
---|---|
Article number | 67 |
Pages (from-to) | 495-511 |
Number of pages | 17 |
Journal | Life Science Journal |
Volume | 11 |
Issue number | 10 |
State | Published - Jan 1 2014 |
Fingerprint
Keywords
- Induction process
- Inter-class relation
- KNN algorithms
- Multi-label classifiers
- Naïve Bays algorithms
- Text categorization
ASJC Scopus subject areas
- Biochemistry, Genetics and Molecular Biology(all)
Cite this
Induction from multi-label examples. / Alsharif, Hind Hazza; Alhalabi, Wadee Saleh; Kubat, Miroslav.
In: Life Science Journal, Vol. 11, No. 10, 67, 01.01.2014, p. 495-511.Research output: Contribution to journal › Article
}
TY - JOUR
T1 - Induction from multi-label examples
AU - Alsharif, Hind Hazza
AU - Alhalabi, Wadee Saleh
AU - Kubat, Miroslav
PY - 2014/1/1
Y1 - 2014/1/1
N2 - The task of text categorization is to assign one or more classes to a document. The simplest machine learning approach to such domains, simply induces a binary classifier separately for each class, and then uses these classifiers in parallel. An example of motivating application is a digital library collection that used to be classified into classes and sub-classes in a hierarchical order. Another important issue that we are considering is the document might belong to more than one class, in this case we will be working on a high performance multi-class label classifier. The study we are intending to do herein is going to show how much we can gain from machine learning. This mean, if we need something like 10 to 15% of the data for training, and testing or do we need > 50% of the data set for training and testing. In the latter case, the machine learning may don't contribute that much. However, if 10 to 15% of the data set is needed, then, machine learning has a great contribution. The last issue we are working on in this research is the inter-class relation. Which means, if the example is classified to belong to a class C, does this mean, the example belong to parents and grandparents classes of the class C, and on the opposite way too? We will use a framework to classify documents automatically and this can indeed answer these questions.
AB - The task of text categorization is to assign one or more classes to a document. The simplest machine learning approach to such domains, simply induces a binary classifier separately for each class, and then uses these classifiers in parallel. An example of motivating application is a digital library collection that used to be classified into classes and sub-classes in a hierarchical order. Another important issue that we are considering is the document might belong to more than one class, in this case we will be working on a high performance multi-class label classifier. The study we are intending to do herein is going to show how much we can gain from machine learning. This mean, if we need something like 10 to 15% of the data for training, and testing or do we need > 50% of the data set for training and testing. In the latter case, the machine learning may don't contribute that much. However, if 10 to 15% of the data set is needed, then, machine learning has a great contribution. The last issue we are working on in this research is the inter-class relation. Which means, if the example is classified to belong to a class C, does this mean, the example belong to parents and grandparents classes of the class C, and on the opposite way too? We will use a framework to classify documents automatically and this can indeed answer these questions.
KW - Induction process
KW - Inter-class relation
KW - KNN algorithms
KW - Multi-label classifiers
KW - Naïve Bays algorithms
KW - Text categorization
UR - http://www.scopus.com/inward/record.url?scp=84903215876&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84903215876&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:84903215876
VL - 11
SP - 495
EP - 511
JO - Life Science Journal
JF - Life Science Journal
SN - 1097-8135
IS - 10
M1 - 67
ER -