On Generalized Bisimilarity Join

doi:10.13328/j.cnki.jos.005244

微信服务号

微信订阅号

2025-5-11- 7

Home > Archive>Volume 28, Issue 12, 2017 >3223-3240. DOI:10.13328/j.cnki.jos.005244

PDF HTML XML Export Cite reminder

On Generalized Bisimilarity Join
DOI:
                        10.13328/j.cnki.jos.005244
                    
Author:
                        WANG Chang-PingWANG Chang-Ping
School of Software, Tsinghua University, Beijing 100084, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Chao-KunWANG Chao-Kun
School of Software, Tsinghua University, Beijing 100084, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG HaoWANG Hao
School of Software, Tsinghua University, Beijing 100084, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG MengWANG Meng
School of Software, Tsinghua University, Beijing 100084, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CHEN JunCHEN Jun
School of Software, Tsinghua University, Beijing 100084, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (61373023)

Article

Figures

Metrics

Reference [22]

Cited by

Materials

Comments

Abstract:

Similarity join is one of the hottest topics in the field of data management, and it has been widely applied in many fields. However, existing similarity join methods cannot meet the increasing demands in the real world. This paper define generalized bisimilarity join as a new similarity join to expend the applications of the similarity join research by introducing the satisfaction operator on various data types with individual thresholds. Two efficient methods, SJS(sub-join set) and MFV(mapping-filtering-verification), are proposed to solve this problem. A large amount of experiments conducted on both real-world and synthetic datasets demonstrate the correctness and the effectiveness of the proposed methods.

Reference

[1] Xiao C, Wang W, Lin XM, Yu JX, Wang GR. Efficient similarity joins for near-duplicate detection. ACM Trans. on Database Systems (TODS), 2011,36(3).[doi:10.1145/2000824.2000825]

[2] Papapetrou P, Athitsos V, Kollios G, Gunopulos D. Reference-Based alignment in large sequence databases. Proc. of the VLDB Endowment, 2009,2(1):205-216.[doi:10.14778/1687627.1687651]

[3] Li YN, Patel JM, Terrell A. Wham:A high-throughput sequence alignment method. ACM Trans. on Database Systems (TODS), 2012,37(4):28.[doi:10.1145/2389241.2389247]

[4] Chaudhuri S, Ganti V, Kaushik R. A primitive operator for similarity joins in data cleaning. In:Proc. of the 22nd Int'l Conf. on Data Engineering (ICDE 2006). 2006.[doi:10.1109/ICDE.2006.9]

[5] Liu XL, Wang HZ, Li JZ, Gao H. Similarity join algorithm based on entity. Ruan Jian Xue Bao/Journal of Software, 2015,26(6):1421-1437(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4610.htm[doi:10.13328/j.cnki.jos.004610]

[6] Arasu A, Ganti V, Kaushik R. Efficient exact set-similarity joins. In:Proc. of the 32nd Int'l Conf. on Very Large Data Bases. 2006. 918-929.

[7] Zhang ZJ, Hadjieleftheriou M, Ooi BC, SrivastavaD. Bed-Tree:An all-purpose index structure for string similarity search based on edit distance. In:Proc. of the 2010 ACM SIGMOD Int'l Conf. on Management of Data. 2010. 915-926.[doi:10. 1145/1807167. 1807266]

[8] Zhao X, Xiao C, Lin XM, Wang W. Efficient graph similarity joins with edit distance constraints. In:Proc. of the 28th Int'l Conf. on Data Engineering (ICDE 2012). 2012. 834-845.[doi:10.1109/ICDE.2012.91]

[9] Wang JN, Feng JH, Li GL. Trie-Join:Efficient trie-based string similarity joins with edit-distance constraints. Proc. of the VLDB Endowment, 2010,3(1-2):1219-1230.

[10] Li GL, Deng D, Wang JN, Feng JH. Pass-Join:A partition-based method for similarity joins. Proc. of the VLDB Endowment, 2011, 5(3):253-264.

[11] Li R, Ju L, Peng Z, Yu ZW, Wang CK. Batch text similarity search with MapReduce. In:Proc. of the 13th Asia-Pacific Web Conf. Beijing, 2011. 412-423.

[12] Lu W, Du XY, Hadjieleftheriou MM, Ooi BC. Efficiently supporting edit distance based string similarity search using B+-trees. IEEE Trans. on Knowledge and Data Engineering, 2014,26(12):2983-2996.[doi:10.1109/TKDE.2014.2309131]

[13] Wang JN, Li GL, Deng D, Zhang Y, Feng JH. Two birds with one stone:An efficient hierarchical framework for top-k and threshold-based string similarity search. In:Proc. of the 31st Int'l Conf. on the Data Engineering (ICDE 2015). 2015. 519-530.[doi:10.1109/ICDE.2015.7113311]

[14] Wang JN, Li GL, Feng JH. Fast-Join:An efficient method for fuzzy token matching based string similarity join. In:Proc. of the 27th Int'l Conf. on the Data Engineering (ICDE 2011). 2011. 458-469.[doi:10.1109/ICDE.2011.5767865]

[15] Wang W, Qin JB, Xiao C, Lin XM, Shen HT. VChunkJoin:An efficient algorithm for edit similarity joins. IEEE Trans. on Knowledge and Data Engineering, 2013,25(8):1916-1929.[doi:10.1109/TKDE.2012.79]

[16] Wang CK, Wang JM, Lin XM, Wang W, Wang HX, Li HS, Tian WP, Xu J, Li R. MapDupReducer:Detecting near duplicates over massive datasets. In:Proc. of the 2010 ACM SIGMOD Int'l Conf. on Management of Data. 2010. 1119-1122.[doi:10.1145/1807167.1807296]

[17] Deng D, Li GL, Feng JH. A pivotal prefix based filtering algorithm for string similarity search. In:Proc. of the 2014 ACM SIGMOD Int'l Conf. on Management of Data. 2014. 673-684.[doi:10.1145/2588555.2593675]

[18] Rong CT, Lu W, Wang XL, Du XY, Chen YG, Tung AKH. Efficient and scalable processing of string similarity join. IEEE Trans. on Knowledge and Data Engineering, 2013,25(10):2217-2230.[doi:10.1109/TKDE.2012.195]

[19] Deng D, Li GL, Hao S, Wang JN, Feng JH. MassJoin:A MapReduce-based method for scalable string similarity joins. In:Proc. of the 30th Int'l Conf. on the Data Engineering (ICDE 2014). 2014. 340-351.[doi:10.1109/ICDE.2014.6816663]

[20] Brualdi RA. Introductory Combinatorics. 5th ed., Pearson Education, 2009.

附中文参考文献:

[5] 刘雪莉,王宏志,李建中,高宏.基于实体的相似性连接算法.软件学报,2015,26(6):1421-1437. http://www.jos.org.cn/1000-9825/4610.htm[doi:10.13328/j.cnki.jos.004610]

Get Citation

王昶平,王朝坤,汪浩,王萌,陈俊.泛化双向相似连接.软件学报,2017,28(12):3223-3240

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:August 01,2016
Revised:October 26,2016
Adopted:
Online: March 27,2017
Published:

You are the first2043735Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History