Distributed Relationship Mining over Big Scholar Data

Research output: Contribution to journalArticlepeer-review


In this paper, we propose a system infrastructure to construct the big scholar data as a large knowledge graph, discover the meta paths between the entities and calculate the relevancy between entities in the graph . The core infrastructure is established on the secured and private Amazon Elastic Compute Cloud(Amazon EC2) platform. The infrastructure maintains the data evenly across the repositories, processes the data parallel by utilizing open source Spark framework, manages computing resources optimally by utilizing YARN and Hadoop HDFS, and discovers the relationship distributively between different types of entities. We incorporate four relationship discovery tasks including citation recommendation, potential collaborator discovery, similar venue measurement and paper to venue recommendation on top of this infrastructure. For relationship mining tasks, we propose a mixed and weighted meta path (MWMP) method to explore the potential relationship between different types of entities. To verify the accuracy and measure parallelization speedup of our algorithm, we set up clusters through Amazon EC2 platform.

Original languageEnglish (US)
JournalIEEE Transactions on Emerging Topics in Computing
StateAccepted/In press - Apr 23 2018


  • Big Data
  • Complexity theory
  • Data mining
  • Distributed databases
  • Distributed System
  • Graph Recommendation
  • Heterogeneous Information Network
  • Knowledge engineering
  • Scholarly Big Data
  • Semantics
  • Task analysis

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Information Systems
  • Human-Computer Interaction
  • Computer Science Applications

Fingerprint Dive into the research topics of 'Distributed Relationship Mining over Big Scholar Data'. Together they form a unique fingerprint.

Cite this