Research on Record Pair Ranking for Entity Resolution with Time Constraint

doi:10.13328/j.cnki.jos.005900

微信服务号

微信订阅号

2025-6-2- 20

Home > Archive>Volume 31, Issue 3, 2020 >695-709. DOI:10.13328/j.cnki.jos.005900

PDF HTML XML Export Cite reminder

Research on Record Pair Ranking for Entity Resolution with Time Constraint
DOI:
                        10.13328/j.cnki.jos.005900
                    
Author:
                        SUN Chen-ChenSUN Chen-Chen
Key Laboratory of Computer Vision and System of Ministry of Education(Tianjin University of Technology), Tianjin 300384, China;Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology(Tianjin University of Technology), Tianjin 300384, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHEN De-RongSHEN De-Rong
School of Computer Science and Engineering, Northeastern University, Shenyang 110189, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
LI Yu-KunLI Yu-Kun
Key Laboratory of Computer Vision and System of Ministry of Education(Tianjin University of Technology), Tianjin 300384, China;Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology(Tianjin University of Technology), Tianjin 300384, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
XIAO Ying-YuanXIAO Ying-Yuan
Key Laboratory of Computer Vision and System of Ministry of Education(Tianjin University of Technology), Tianjin 300384, China;Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology(Tianjin University of Technology), Tianjin 300384, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
MA Jian-HongMA Jian-Hong
School of ArtificialIntelligence, Hebei University of Technology, Tianjin 300401, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Key Research and Development Program of China (2018YFB1003404); National Natural Science Foundation of China (61672142, 61472070, 61602103); Natural Science Foundation of Tianjin of China (17JCYBJC15200)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Entity resolution (ER) is an important aspect of data integration and data cleaning, and is also a necessary pre-process step of big data analytics and mining. Traditional batch based ER's overall runtime is costly, and cannot satisfy current (nearly) real-time data applications' requirements. Therefore, time constraint entity resolution (TC-ER) is focused on, while core problem is record pair ranking according to match probability both information inner blocks and information across blocks are analyzed from multi-pass blocking respectively, and two basic recordsmatch probability methods are proposed. The basic methods are improved by proposing an advanced record match probability method based on similarity flowing over a biparitite graph.A bipartite graph is constructed according to record pairs, blocks, and relations between them. Similarities iteratively flow between pair nodes and block nodes over the bipartite graph until convergence. The convergence result is computed with fixpoint iterations. An approximate convergence computation mehod is proposed to reduce cost, and it improves real-time recall in TC-ER. Finally, the proposed methods are evaluated on two datasets, which shows their effectiveness and also tests different aspects of the proposed methods.

Key words:entity resolution;record pair ranking;time constraint;data integration

Get Citation

孙琛琛,申德荣,李玉坤,肖迎元,马建红.时间约束的实体解析中记录对排序研究.软件学报,2020,31(3):695-709

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:July 15,2019
Revised:September 10,2019
Adopted:
Online: January 10,2020
Published: March 06,2020

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History