Exploiting dataset similarity for distributed mining

Srinivasan Parthasarathy, Mitsunori Ogihara

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Scopus citations

Abstract

The notion of similarity is an important one in data mining. It can be used to pro vide useful structural information on data as w ell as enable clustering. In this paper we presen t an elegant method for measuring the similarity between homogeneous datasets. The algorithm presented is eÆcient in storage and scale, has the ability to adjust to time constraints. and can provide the user with likely causes of similarity or dis-similarity. One potential application of our similarity measure is in the distributed data mining domain. Using the notion of similarity across databases as a distance metric one cangenerate clusters of similar datasets. Once similar datasets are clustered, each cluster can be independently mined to generate the appropriate rules for a given cluster. The similarity measure is evaluated on a dataset from the Census Bureau, and synthetic datasets from IBM. ?

Original languageEnglish (US)
Title of host publicationParallel and Distributed Processing - 15 IPDPS 2000 Workshops, Proceedings
EditorsJose Rolim
PublisherSpringer Verlag
Pages399-406
Number of pages8
ISBN (Print)354067442X, 9783540674429
DOIs
StatePublished - 2000
Event15 Workshops Held in Conjunction with the IEEE International Parallel and Distributed Processing Symposium, IPDPS 2000 - Cancun, Mexico
Duration: May 1 2000May 5 2000

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume1800 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other15 Workshops Held in Conjunction with the IEEE International Parallel and Distributed Processing Symposium, IPDPS 2000
CountryMexico
CityCancun
Period5/1/005/5/00

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Exploiting dataset similarity for distributed mining'. Together they form a unique fingerprint.

  • Cite this

    Parthasarathy, S., & Ogihara, M. (2000). Exploiting dataset similarity for distributed mining. In J. Rolim (Ed.), Parallel and Distributed Processing - 15 IPDPS 2000 Workshops, Proceedings (pp. 399-406). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 1800 LNCS). Springer Verlag. https://doi.org/10.1007/3-540-45591-4_52