Entity Resolution Oriented Clustering Algorithm
Author:
Affiliation:

Clc Number:

Fund Project:

National Natural Science Foundation of China (61472070, 61402213); National Basic Research Program of China (973) (2012CB316201); Fundamental Research Funds for the Central Universities (N110404010)

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    Entity resolution (ER) is a key aspect of data quality and is necessary for big data processing. Existing ER research focuses on data object similarity algorithms, blocking and supervised ER technologies, but pays little attention to matching decision problems in unsupervised ER. This paper proposes a clustering algorithm for ER to complement existing work. The algorithm builds a weighted similarity graph with data objects and their pairwise similarities. During clustering, the similarity between a cluster and a vertex is dynamically computed via random walk with restarts on the similarity graph. The basic logic behind clustering is that a cluster absorbs the nearest neighbor vertex iteratively. A data object ordering method is also proposed to optimize clustering order, promoting clustering accuracy. Further, an improved computation method of random walk's stationary probability distribution is proposed to reduce cost of the clustering algorithm. The evaluation on real datasets and synthetic datasets validates effectiveness of the proposed algorithm.

    Reference
    Related
    Cited by
Get Citation

孙琛琛,申德荣,寇月,聂铁铮,于戈.面向实体识别的聚类算法.软件学报,2016,27(9):2303-2319

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 24,2015
  • Revised:January 12,2016
  • Adopted:
  • Online: September 02,2016
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063