Algorithms for clustering high dimensional and distributed data

Tao Li, Shenghuo Zhu, Mitsunori Ogihara

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. The clustering problem has been widely studied in machine learning, databases, and statistics. This paper studies the problem of clustering high dimensional data. The paper proposes an algorithm called the CoFD algorithm, which is a non-distance based clustering algorithm for high dimensional spaces. Based on the Maximum Likelihood Principle, CoFD attempts to optimize its parameter settings to maximize the likelihood between data points and the model generated by the parameters. The distributed versions of the problem, called the D-CoFD algorithms, are also proposed. Experimental results on both synthetic and real data sets show the efficiency and effectiveness of CoFD and D-CoFD algorithms.

Original languageEnglish (US)
Pages (from-to)305-326
Number of pages22
JournalIntelligent Data Analysis
Volume7
Issue number4
DOIs
StatePublished - Jan 1 2003
Externally publishedYes

Keywords

  • CoFD
  • clustering
  • distributed
  • high dimensional
  • maximum likelihood

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Algorithms for clustering high dimensional and distributed data'. Together they form a unique fingerprint.

Cite this