Hierarchical document classification using automatically generated hierarchy

Tao Li, Shenghuo Zhu, Mitsunori Ogihara

Research output: Contribution to journalArticle

40 Scopus citations

Abstract

Automated text categorization has witnessed a booming interest with the exponential growth of information and the ever-increasing needs for organizations. The underlying hierarchical structure identifies the relationships of dependence between different categories and provides valuable sources of information for categorization. Although considerable research has been conducted in the field of hierarchical document categorization, little has been done on automatic generation of topic hierarchies. In this paper, we propose the method of using linear discriminant projection to generate more meaningful intermediate levels of hierarchies in large flat sets of classes. The linear discriminant projection approach first transforms all documents onto a low-dimensional space and then clusters the categories into hier- archies accordingly. The paper also investigates the effect of using generated hierarchical structure for text classification. Our experiments show that generated hierarchies improve classification performance in most cases.

Original languageEnglish (US)
Pages (from-to)211-230
Number of pages20
JournalJournal of Intelligent Information Systems
Volume29
Issue number2
DOIs
StatePublished - Oct 1 2007
Externally publishedYes

Keywords

  • Document classification
  • Hierarchy generation
  • Linear discriminant projection
  • Text categorization

ASJC Scopus subject areas

  • Software
  • Information Systems
  • Hardware and Architecture
  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Hierarchical document classification using automatically generated hierarchy'. Together they form a unique fingerprint.

  • Cite this