Web document classification based on fuzzy association

Choochart Haruechaiyasak, Mei-Ling Shyu, Shu Ching Chen, Xiuqi Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

28 Scopus citations

Abstract

In this paper, a method of automatically classifying Web documents into a set of categories using the fuzzy association concept is proposed. Using the same word or vocabulary to describe different entities creates ambiguity, especially in the Web environment where the user population is large. To solve this problem, fuzzy association is used to capture the relationships among different index terms or keywords in the documents, i.e., each pair of words has an associated value to distinguish itself from the others. Therefore, the ambiguity in word usage is avoided. Experiments using data sets collected from two Web portals: Yahoo! (www.yahoo.com) and Open Directory Project (dmoz.org) are conducted. We compare our approach to the vector space model with the cosine coefficient. The results show that our approach yields higher accuracy compared to the vector space model.

Original languageEnglish
Title of host publicationProceedings - IEEE Computer Society's International Computer Software and Applications Conference
Pages487-492
Number of pages6
DOIs
StatePublished - Oct 16 2002
Event26th Annual International Computer Software and Applications Conference - Oxford, United Kingdom
Duration: Aug 26 2002Aug 29 2002

Other

Other26th Annual International Computer Software and Applications Conference
CountryUnited Kingdom
CityOxford
Period8/26/028/29/02

    Fingerprint

Keywords

  • Data mining
  • Document classification
  • Fuzzy association
  • Information processing on the web

ASJC Scopus subject areas

  • Software

Cite this

Haruechaiyasak, C., Shyu, M-L., Chen, S. C., & Li, X. (2002). Web document classification based on fuzzy association. In Proceedings - IEEE Computer Society's International Computer Software and Applications Conference (pp. 487-492) https://doi.org/10.1109/CMPSAC.2002.1045052