Clustering distributed homogeneous datasets

Srinivasan Parthasarathy, Mitsunori Ogihara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Citations (Scopus)

Abstract

In this paper we present an elegant and effective algorithm for measuring the similarity between homogeneous datasets to enable clustering. Once similar datasets are clustered, each cluster can be independently mined to generate the appropriate rules for a given cluster. The algorithm presented is efficient in storage and scale, has the ability to adjust to time constraints, and can provide the user with likely causes of similarity or dis-similarity. The proposed similarity measure is evaluated and validated on real datasets from the Census Bureau, Reuters, and synthetic datasets fromIBM.

Original languageEnglish (US)
Title of host publicationPrinciples of Data Mining and Knowledge Discovery - 4th European Conference, PKDD 2000, Proceedings
PublisherSpringer Verlag
Pages566-574
Number of pages9
Volume1910
ISBN (Print)9783540410669
StatePublished - 2000
Externally publishedYes
Event4th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2000 - Lyon, France
Duration: Sep 13 2000Sep 16 2000

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1910
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other4th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2000
CountryFrance
CityLyon
Period9/13/009/16/00

Fingerprint

Clustering
Census
Similarity Measure
Likely
Similarity

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Parthasarathy, S., & Ogihara, M. (2000). Clustering distributed homogeneous datasets. In Principles of Data Mining and Knowledge Discovery - 4th European Conference, PKDD 2000, Proceedings (Vol. 1910, pp. 566-574). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1910). Springer Verlag.

Clustering distributed homogeneous datasets. / Parthasarathy, Srinivasan; Ogihara, Mitsunori.

Principles of Data Mining and Knowledge Discovery - 4th European Conference, PKDD 2000, Proceedings. Vol. 1910 Springer Verlag, 2000. p. 566-574 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1910).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Parthasarathy, S & Ogihara, M 2000, Clustering distributed homogeneous datasets. in Principles of Data Mining and Knowledge Discovery - 4th European Conference, PKDD 2000, Proceedings. vol. 1910, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 1910, Springer Verlag, pp. 566-574, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases, PKDD 2000, Lyon, France, 9/13/00.
Parthasarathy S, Ogihara M. Clustering distributed homogeneous datasets. In Principles of Data Mining and Knowledge Discovery - 4th European Conference, PKDD 2000, Proceedings. Vol. 1910. Springer Verlag. 2000. p. 566-574. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Parthasarathy, Srinivasan ; Ogihara, Mitsunori. / Clustering distributed homogeneous datasets. Principles of Data Mining and Knowledge Discovery - 4th European Conference, PKDD 2000, Proceedings. Vol. 1910 Springer Verlag, 2000. pp. 566-574 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{40fcf0e17a654ab0b25bd9d126144894,
title = "Clustering distributed homogeneous datasets",
abstract = "In this paper we present an elegant and effective algorithm for measuring the similarity between homogeneous datasets to enable clustering. Once similar datasets are clustered, each cluster can be independently mined to generate the appropriate rules for a given cluster. The algorithm presented is efficient in storage and scale, has the ability to adjust to time constraints, and can provide the user with likely causes of similarity or dis-similarity. The proposed similarity measure is evaluated and validated on real datasets from the Census Bureau, Reuters, and synthetic datasets fromIBM.",
author = "Srinivasan Parthasarathy and Mitsunori Ogihara",
year = "2000",
language = "English (US)",
isbn = "9783540410669",
volume = "1910",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "566--574",
booktitle = "Principles of Data Mining and Knowledge Discovery - 4th European Conference, PKDD 2000, Proceedings",
address = "Germany",

}

TY - GEN

T1 - Clustering distributed homogeneous datasets

AU - Parthasarathy, Srinivasan

AU - Ogihara, Mitsunori

PY - 2000

Y1 - 2000

N2 - In this paper we present an elegant and effective algorithm for measuring the similarity between homogeneous datasets to enable clustering. Once similar datasets are clustered, each cluster can be independently mined to generate the appropriate rules for a given cluster. The algorithm presented is efficient in storage and scale, has the ability to adjust to time constraints, and can provide the user with likely causes of similarity or dis-similarity. The proposed similarity measure is evaluated and validated on real datasets from the Census Bureau, Reuters, and synthetic datasets fromIBM.

AB - In this paper we present an elegant and effective algorithm for measuring the similarity between homogeneous datasets to enable clustering. Once similar datasets are clustered, each cluster can be independently mined to generate the appropriate rules for a given cluster. The algorithm presented is efficient in storage and scale, has the ability to adjust to time constraints, and can provide the user with likely causes of similarity or dis-similarity. The proposed similarity measure is evaluated and validated on real datasets from the Census Bureau, Reuters, and synthetic datasets fromIBM.

UR - http://www.scopus.com/inward/record.url?scp=84879106048&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84879106048&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84879106048

SN - 9783540410669

VL - 1910

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 566

EP - 574

BT - Principles of Data Mining and Knowledge Discovery - 4th European Conference, PKDD 2000, Proceedings

PB - Springer Verlag

ER -