Survey on Data Integration Technologies for Relational Data and Knowledge Graph
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [157]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Recently, big data is considered a critical strategic resource by many countries and regions. However, difficult data circulation and insufficient data regulation commonly exist in the big data era, thereby leading to the serious phenomenon of data silos, poor data quality, and difficulty in unleashing the potential of data elements. This provokes researchers to explore data integration techniques for breaking data barriers, enabling data sharing, improving data quality, and activating the potential of data elements. Relational data and knowledge graphs, as two significant forms of data organization and storage, have been widely applied in real life. To this end, this study focuses on relational data and knowledge graphs to summarize and analyze the key technologies of data integration, including entity resolution, data fusion, and data cleaning. Finally, it prospects future research directions.

    Reference
    [1] Reinsel D, Gantz J, Rydning J. The digitization of the world from edge to core. 2022. http://cloudcode.me/media/1014/idc.pdf
    [2] 中国信息通信研究院. 大数据白皮书. 2021. http://www.caict.ac.cn/kxyj/qwfb/bps/202112/t20211220_394300.htm
    China Academy of Information and Communications Technology. Big Data White Paper. 2021 (in Chinese). http://www.caict.ac.cn/kxyj/qwfb/bps/202112/t20211220_394300.htm
    [3] 陈跃国, 王京春. 数据集成综述. 计算机科学, 2004, 31(5): 48-51.
    Chen YG, Wang JC. A review of data integration. Computer Science, 2004, 31(5): 48-51 (in Chinese with English abstract).
    [4] 杨先娣, 彭智勇, 刘君强, 李旭辉. 信息集成研究综述. 计算机科学, 2006, 33(7): 55-59, 80.
    Yang XD, Peng ZY, Liu JQ, Li XH. An overview of information integration. Computer Science, 2006, 33(7): 55-59, 80 (in Chinese with English abstract).
    [5] 王淞, 彭煜玮, 兰海, 罗倩雯, 彭智勇. 数据集成方法发展与展望. 软件学报, 2020, 31(3): 893-908. http://www.jos.org.cn/1000-9825/5911.htm
    Wang S, Peng YW, Lan H, Luo QW, Peng ZY. Survey and prospect: Data integration methodologies. Ruan Jian Xue Bao/Journal of Software, 2020, 31(3): 893-908 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5911.htm
    [6] Getoor L, Machanavajjhala A. Entity resolution: Theory, practice & open challenges. Proceedings of the VLDB Endowment, 2012, 5(12): 2018-2019. [doi: 10.14778/2367502.2367564]
    [7] Sun ZQ, Zhang QH, Hu W, Wang CM, Chen MH, Akrami F, Li CK. A benchmarking study of embedding-based entity alignment for knowledge graphs. Proceedings of the VLDB Endowment, 2020, 13(12): 2326-2340. [doi: 10.14778/3407790.3407828]
    [8] 庄严, 李国良, 冯建华. 知识库实体对齐技术综述. 计算机研究与发展, 2016, 53(1): 165-192. [doi: 10.7544/issn1000-1239.2016.20150661]
    Zhuang Y, Li GL, Feng JH. A survey on entity alignment of knowledge base. Journal of Computer Research and Development, 2016, 53(1): 165-192 (in Chinese with English abstract). [doi: 10.7544/issn1000-1239.2016.20150661]
    [9] 孟小峰, 杜治娟. 大数据融合研究: 问题与挑战. 计算机研究与发展, 2016, 53(2): 231-246. [doi: 10.7544/issn1000-1239.2016.20150874]
    Meng XF, Du ZJ. Research on the big data fusion: Issues and challenges. Journal of Computer Research and Development, 2016, 53(2): 231-246 (in Chinese with English abstract). [doi: 10.7544/issn1000-1239.2016.20150874]
    [10] 郭志懋, 周傲英. 数据质量和数据清洗研究综述. 软件学报, 2002, 13(11): 2076-2082. http://www.jos.org.cn/jos/article/abstract/20021103?st=search
    Guo ZM, Zhou AY. Research on data quality and data cleaning: A survey. Ruan Jian Xue Bao/Journal of Software, 2002, 13(11): 2076-2082 (in Chinese with English abstract). http://www.jos.org.cn/jos/article/abstract/20021103?st=search
    [11] 郝爽, 李国良, 冯建华, 王宁. 结构化数据清洗技术综述. 清华大学学报(自然科学版), 2018, 58(12): 1037-1050.
    Hao S, Li GL, Feng JH, Wang N. Survey of structured data cleaning methods. Journal of Tsinghua University (Science and Technology), 2018, 58(12): 1037-1050 (in Chinese with English abstract).
    [12] 王鑫, 邹磊, 王朝坤, 彭鹏, 冯志勇. 知识图谱数据管理研究综述. 软件学报, 2019, 30(7): 2139-2174. http://www.jos.org.cn/1000-9825/5841.htm
    Wang X, Zou L, Wang CK, Peng P, Feng ZY. Research on knowledge graph data management: A survey. Ruan Jian Xue Bao/Journal of Software, 2019, 30(7): 2139-2174 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5841.htm
    [13] Fan WF, Geng L, Jin RC, Lu P, Tugay R, Yu WY. Linking entities across relations and graphs. In: Proc. of the 38th IEEE Int’l Conf. on Data Engineering. Kuala Lumpur: IEEE, 2022. 634–647.
    [14] Ahmadi N, Sand H, Papotti P. Unsupervised matching of data and text. In: Proc. of the 38th IEEE Int’l Conf. on Data Engineering. Kuala Lumpur: IEEE, 2022. 1058–1070.
    [15] Li YL, Li JF, Suhara Y, Doan AH, Tan WC. Deep entity matching with pre-trained language models. Proceedings of the VLDB Endowment, 2020, 14(1): 50–60. [doi: 10.14778/3421424.3421431]
    [16] Azzalini F, Jin SL, Renzi M, Tanca L. Blocking techniques for entity linkage: A semantics-based approach. Data Science and Engineering, 2021, 6(1): 20–38. [doi: 10.1007/s41019-020-00146-w]
    [17] Joshi M, Levy O, Weld DS, Zettlemoyer L. BERT for coreference resolution: Baselines and analysis. arXiv:1908.09091, 2019.
    [18] Datta R, Joshi D, Li J, Wang JZ. Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Surveys, 2008, 40(2): 5.
    [19] Cappuzzo R, Papotti P, Thirumuruganathan S. Creating embeddings of heterogeneous relational datasets for data integration tasks. In: Proc. of the 2020 ACM SIGMOD Int’l Conf. on Management of Data. Portland: ACM, 2020. 1335−1349.
    [20] Konda P, Das S, Paul Suganthan GC, Doan AH, Ardalan A, Ballard JR, Li H, Panahi F, Zhang HJ, Naughton J, Prasad S, Krishnan G, Deep R, Raghavendra V. Magellan: Toward building entity matching management systems over data science stacks. Proceedings of the VLDB Endowment, 2016, 9(13): 1581-1584. [doi: 10.14778/3007263.3007314] (查阅所有网上资料, 请确认标黄部分信息)
    [21] Chen C, Golshan B, Halevy AY, Tan WC, Doan AH. BigGorilla: An open-source ecosystem for data preparation and integration. IEEE Data Engineering Bulletin, 2018, 41(2): 10−22.
    [22] Arasu A, Ré C, Suciu D. Large-scale deduplication with constraints using dedupalog. In: Proc. of the 25th IEEE Int’l Conf. on Data Engineering. Shanghai: IEEE, 2009. 952-963.
    [23] Fan WF, Jia XB, Li JZ, Ma S. Reasoning about record matching rules. Proceedings of the VLDB Endowment, 2009, 2(1): 407-418. [doi: 10.14778/1687627.1687674]
    [24] Hernández MA, Stolfo SJ. The merge/purge problem for large databases. ACM SIGMOD Record, 1995, 24(2): 127–138. [doi: 10.1145/568271.223807]
    [25] Singh R, Meduri V, Elmagarmid A, Madden S, Papotti P, Quiané-Ruiz JA, Solar-Lezama A, Tang N. Generating concise entity matching rules. In: Proc. of the 2017 ACM SIGMOD Int’l Conf. on Management of Data. Chicago: ACM, 2017. 1635−1638.
    [26] Singh R, Meduri VV, Elmagarmid A, Madden S, Papotti P, Quiané-Ruiz JA, Solar-Lezama A, Tang N. Synthesizing entity matching rules by examples. Proceedings of the VLDB Endowment, 2017, 11(2): 189–202. [doi: 10.14778/3149193.3149199]
    [27] Marcus A, Wu E, Karger D, Madden S, Miller R. Human-powered sorts and joins. Proceedings of the VLDB Endowment, 2011, 5(1): 13-24. [doi: 10.14778/2047485.2047487]
    [28] Wang JN, Kraska T, Franklin MJ, Feng JH. CrowdER: Crowdsourcing entity resolution. Proceedings of the VLDB Endowment, 2012, 5(11): 1483-1494. [doi: 10.14778/2350229.2350263]
    [29] Gokhale C, Das S, Doan AH, Naughton JF, Rampalli N, Shavlik J, Zhu XJ. Corleone: Hands-off crowdsourcing for entity matching. In: Proc. of the 2014 ACM SIGMOD Int’l Conf. on Management of Data. Snowbird: ACM, 2014. 601−612.
    [30] Chai CL, Li GL, Li J, Deng D, Feng JH. Cost-effective crowdsourced entity resolution: A partial-order approach. In: Proc. of the 2016 ACM SIGMOD Int’l Conf. on Management of Data. San Francisco: ACM, 2016. 969−984.
    [31] Vesdapunt N, Bellare K, Dalvi N. Crowdsourcing algorithms for entity resolution. Proceedings of the VLDB Endowment, 2014, 7(12): 1071-1082. [doi: 10.14778/2732977.2732982]
    [32] Bilenko M, Mooney RJ. Adaptive duplicate detection using learnable string similarity measures. In: Proc. of the 9th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Washington: ACM, 2003. 39−48.
    [33] Cohen WW, Richman J. Learning to match and cluster large high-dimensional data sets for data integration. In: Proc. of the 8th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Edmonton: ACM, 2002. 475−480.
    [34] Sarawagi S, Bhamidipaty A. Interactive deduplication using active learning. In: Proc. of the 8th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Edmonton: ACM, 2002. 269−278.
    [35] Wu RZ, Chaba S, Sawlani S, Chu X, Thirumuruganathan S. ZeroER: Entity resolution using zero labeled examples. In: Proc. of the 2020 ACM SIGMOD Int’l Conf. on Management of Data. Portland: ACM, 2020. 1149−1164.
    [36] Collobert R, Bengio S. SVMTorch: Support vector machines for large-scale regression problems. Journal of Machine Learning Research, 2001, 1: 143-160. [doi: 10.1162/15324430152733142]
    [37] Rish I. An empirical study of the naive bayes classifier. IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence. 2001, 3(22): 41–46.
    [38] Reynolds D. Gaussian mixture models. In: Li SZ, Jain AK, eds. Encyclopedia of Biometrics. Boston, MA: Springer, 2015. 827−832.
    [39] Ebraheem M, Thirumuruganathan S, Joty S, Ouzzani M, Tang N. Distributed representations of tuples for entity resolution. Proceedings of the VLDB Endowment, 2018, 11(11): 1454–1467. [doi: 10.14778/3236187.3236198]
    [40] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8): 1735-1780. [doi: 10.1162/neco.1997.9.8.1735]
    [41] Pennington J, Socher R, Manning C. GloVe: Global vectors for word representation. In: Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing. Doha: ACL, 2014. 1532-1543.
    [42] Mudgal S, Li H, Rekatsinas T, Doan AH, Park Y, Krishnan G, Deep R, Arcaute E, Raghavendra V. Deep learning for entity matching: A design space exploration. In: Proc. of the 2020 ACM SIGMOD Int’l Conf. on Management of Data. Houston: ACM, 2018. 19−34.
    [43] Nie H, Han XP, He B, Sun L, Chen B, Zhang W, Wu SH, Kong H. Deep sequence-to-sequence entity matching for heterogeneous entity resolution. In: Proc. of the 28th ACM Int’l Conf. on Information and Knowledge Management. Beijing: ACM, 2019. 629-638.
    [44] Fu C, Han XP, Sun L, Chen B, Zhang W, Wu SH, Kong H. End-to-end multi-perspective matching for entity resolution. In: Proc. of the 28th Int’l Joint Conf. on Artificial Intelligence. Macao: AAAI Press, 2019. 4961-4967.
    [45] Zhang DX, Nie YY, Wu S, Shen YY, Tan KL. Multi-context attention for entity matching. In: Proc. of the 2020 Web Conf. Taipei: ACM, 2020. 2634-2640.
    [46] Kasai J, Qian K, Gurajada S, Li YY, Popa L. Low-resource deep entity resolution with transfer and active learning. In: Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: Association for Computational Linguistics, 2019. 5851−5861.
    [47] Zhao C, He YY. Auto-EM: End-to-end fuzzy entity-matching using pre-trained deep models and transfer learning. In: Proc. of the 2019 World Wide Web Conf. San Francisco: ACM, 2019. 2413-2424.
    [48] Li B, Miao YK, Wang YS, Sun YF, Wang W. Improving the efficiency and effectiveness for BERT-based entity resolution. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(15): 13226-13233. [doi: 10.1609/aaai.v35i15.17562]
    [49] Li B, Wang W, Sun YF, Zhang LH, Ali MA, Wang Y. GraphER: Token-centric entity resolution with graph convolutional neural networks. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 8172-8179. [doi: 10.1609/aaai.v34i05.6330]
    [50] Zügner D, Akbarnejad A, Günnemann S. Adversarial attacks on neural networks for graph data. In: Proc. of the 24th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. London: ACM, 2018. 2847–2856.
    [51] Li P, Cheng X, Chu X, He YY, Chaudhuri S. Auto-FuzzyJoin: Auto-program fuzzy similarity joins without labeled examples. In: Proc. of the 2021 Int’l Conf. on Management of Data. ACM, 2021. 1064-1076.
    [52] Zhang DX, Li DS, Guo L, Tan KL. Unsupervised entity resolution with blocking and graph algorithms. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(3): 1501-1515. [doi: 10.1109/tkde.2020.2991063]
    [53] Ge CC, Wang PF, Chen L, Liu XZ, Zheng BH, Gao YJ. CollaborEM: A self-supervised entity matching framework using multi-features collaboration. IEEE Trans. on Knowledge and Data Engineering, 2021.
    [54] Mahdisoltani F, Biega J, Suchanek FM. YAGO3: A knowledge base from multilingual wikipedias. In: Proc. of the 7th Biennial Conf. on Innovative Data Systems Research. Asilomar: CIDR, 2015. 1-11.
    [55] Jiménez-Ruiz E, Cuenca Grau B. LogMap: Logic-based and scalable ontology matching. In: Proc. of the 10th Int’l Semantic Web Conf. Bonn: Springer, 2011. 273-288.
    [56] Zhuang Y, Li GL, Zhong ZJ, Feng JH. Hike: A hybrid human-machine method for entity alignment in large-scale knowledge bases. In: Proc. of the 2017 ACM on Conf. on Information and Knowledge Management. Singapore: ACM, 2017. 1917−1926.
    [57] Suchanek FM, Abiteboul S, Senellart P. PARIS: Probabilistic alignment of relations, instances, and schema. Proceedings of the VLDB Endowment, 2011, 5(3): 157-168. [doi: 10.14778/2078331.2078332]
    [58] Bordes A, Usunier N, Garcia-Durán A, Weston J, Yakhnenko O. Translating embeddings for modeling multi-relational data. In: Proc. of the 26th Annual Conf. on Neural Information Processing Systems. Lake Tahoe: Curran Associates Inc., 2013. 2787-2795.
    [59] Trouillon T, Welbl J, Riedel S, Gaussier É, Bouchard G. Complex embeddings for simple link prediction. In: Proc. of the 33rd Int’l Conf. on Machine Learning. New York: JMLR.org, 2016. 2071-2080.
    [60] Chen MH, Tian YT, Yang MH, Zaniolo C. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In: Proc. of the 26th Int’l Joint Conf. on Artificial Intelligence. Melbourne: AAAI Press, 2017. 1151−1517.
    [61] Zhu H, Xie RB, Liu ZY, Sun SM. Iterative entity alignment via joint knowledge embeddings. In: Proc. of the 26th Int’l Joint Conf. on Artificial Intelligence. Melbourne: AAAI Press, 2017. 4258−4264.
    [62] Sun ZQ, Hu W, Zhang QH, Qu YZ. Bootstrapping entity alignment with knowledge graph embedding. In: Proc. of the 27th Int’l Joint Conf. on Artificial Intelligence. Stockholm: AAAI Press, 2018. 4396−4402.
    [63] Pei SC, Yu L, Hoehndorf R, Zhang XL. Semi-supervised entity alignment via knowledge graph embedding with awareness of degree difference. In: Proc. of the 2019 World Wide Web Conf. San Francisco: ACM, 2019. 3130–3136.
    [64] Wang ZC, Lv QS, Lan XH, Zhang Y. Cross-lingual knowledge graph alignment via graph convolutional networks. In: Proc. of the 2018 Conf. on Empirical Methods in Natural Language Processing. Brussels: ACL, 2018. 349–357.
    [65] Cao YX, Liu ZY, Li CJ, Liu ZY, Li JZ, Chua TS. Multi-channel graph neural network for entity alignment. In: Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019. 1452–1461.
    [66] Sun ZQ, Chen MH, Hu W, Wang CM, Dai J, Zhang W. Knowledge association with hyperbolic knowledge graph embeddings. In: Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing. ACL, 2020. 5704–5716.
    [67] Pei SC, Yu L, Yu GX, Zhang XL. REA: Robust cross-lingual entity alignment between knowledge graphs. In: Proc. of the 26th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. CA: ACM, 2020. 2175–2184.
    [68] Li SN, Li X, Ye R, Wang MZ, Su HP, Ou YZ. Non-translational alignment for multi-relational networks. In: Proc. of the 27th Int’l Joint Conf. on Artificial Intelligence. Stockholm: AAAI Press, 2018. 4180−4186.
    [69] Qi ZY, Zhang ZH, Chen JY, Chn X, Xiang YJ, Zhang NY, Zheng YF. Unsupervised knowledge graph alignment by probabilistic reasoning and semantic embedding. In: Proc. of the 30th Int’l Joint Conf. on Artificial Intelligence. Montreal: IJCAI.org, 2021. 2019–2025.
    [70] Chen MH, Tian YT, Chang K W, Skiena S, Zaniolo C. Co-training embeddings of knowledge graphs and entity descriptions for cross-lingual entity alignment. In: Proc. of the 27th Int’l Joint Conf. on Artificial Intelligence. Stockholm: AAAI Press, 2018. 3998−4004.
    [71] Trisedya BD, Qi JZ, Zhang R. Entity alignment between knowledge graphs using attribute embeddings. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 297–304. [doi: 10.1609/aaai.v33i01.3301297]
    [72] Guo LB, Sun ZQ, Hu W. Learning to exploit long-term relational dependencies in knowledge graphs. In: Proc. of the 36th Int’l Conf. on Machine Learning. Long Beach: PMLR, 2019. 2505–2514.
    [73] Yang HW, Zou YY, Shi P, Lu W, Lin J, Sun X. Aligning cross-lingual entities with multi-aspect information. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing. Hong Kong: ACL, 2019. 4431–4441.
    [74] Tang XB, Zhang J, Chen B, Yang Y, Chen H, Li CP. BERT-INT: A BERT-based interaction model for knowledge graph alignment. In: Proc. of the 29th Int’l Joint Conf. on Artificial Intelligence. Yokohama: IJCAI.org, 2020. 3174–3180.
    [75] Xin KX, Sun ZQ, Hua W, Hu W, Zhou XF. Informed multi-context entity alignment. In: Proc. of the 15th ACM Int’l Conf. on Web Search and Data Mining. Tempe: ACM, 2022. 1197–1205.
    [76] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.
    [77] Zhang QH, Sun ZQ, Hu W, Chen MH, Guo LB, Qu YZ. Multi-view knowledge graph embedding for entity alignment. In: Proc. of the 28th Int’l Joint Conf. on Artificial Intelligence. Macao: AAAI Press, 2019. 5429–5435.
    [78] Liu ZY, Cao YX, Pan LM, Li JZ, Liu ZY, Chua TS. Exploring and evaluating attributes, values, and structures for entity alignment. In: Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing. ACL, 2020. 6355–6364.
    [79] Flach PA, Savnik I. Database dependency discovery: A machine learning approach. AI Communications, 1999, 12(3): 139–160.
    [80] Sun ZQ, Hu W, Li CK. Cross-lingual entity alignment via joint attribute-preserving embedding. In: Proc. of the 16th Int’l Semantic Web Conf. Vienna: Springer, 2017. 628-644.
    [81] Liu FY, Chen MH, Roth D, Collier N. Visual pivoting for (unsupervised) entity alignment. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(5): 4257–4266. [doi: 10.1609/aaai.v35i5.16550]
    [82] Zhu Y, Liu HZ, Wu ZH, Du YP. Relation-aware neighborhood matching model for entity alignment. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(5): 4749–4756. [doi: 10.1609/aaai.v35i5.16606]
    [83] Chen MH, Shi WJ, Zhou B, Roth D. Cross-lingual entity alignment with incidental supervision. In: Proc. of the 16th Conf. of the European Chapter of the Association for Computational Linguistics. ACL, 2021. 645–658.
    [84] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805, 2019.
    [85] Yang K, Liu SQ, Zhao JF, Wang YS, Xie B. COTSAE: Co-training of structure and attribute embeddings for entity alignment. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(3): 3025–3032. [doi: 10.1609/aaai.v34i03.5696]
    [86] Mao X, Wang WT, Xu HM, Lan M, Wu YB. MRAEA: An efficient and robust entity alignment approach for cross-lingual knowledge graph. In: Proc. of the 13th Int’l Conf. on Web Search and Data Mining. Houston: ACM, 2020. 420–428.
    [87] Gao YJ, Liu XZ, Wu JY, Li TY, Wang PF, Chen L. ClusterEA: Scalable entity alignment with stochastic training and normalized mini-batch similarities. In: Proc. of the 28th ACM SIGKDD Conf. on Knowledge Discovery and Data Mining. Washington: ACM, 2022. 421–431.
    [88] Liu B, Scells H, Zuccon G, Hua W, Zhao GH. ActiveEA: Active learning for neural entity alignment. In: Proc. of the 2021 Conf. on Empirical Methods in Natural Language Processing. Punta Cana: ACL, 2021. 3364–3374.
    [89] Ge CC, Liu XZ, Chen L, Zheng BH, Gao YJ. Make it easy: An effective end-to-end entity alignment framework. In: Proc. of the 44th Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. ACM, 2021. 777–786.
    [90] Mao X, Wang WT, Wu YB, Lan M. Are negative samples necessary in entity alignment?: An approach with high performance, scalability and robustness. In: Proc. of the 30th ACM Int’l Conf. on Information and Knowledge Management. Queensland: ACM, 2021. 1263–1273.
    [91] Mao X, Wang WT, Wu YB, Lan M. From alignment to assignment: Frustratingly simple unsupervised entity alignment. In: Proc. of the 2021 Conf. on Empirical Methods in Natural Language Processing. Punta Cana: ACL, 2021. 2843–2853.
    [92] Zhao X, Zeng WX, Tang JY, Wang W, Suchanek FM. An experimental study of state-of-the-art entity alignment approaches. IEEE Transactions on Knowledge and Data Engineering, 2022, 34(6): 2610–2625. [doi: 10.1109/TKDE.2020.3018741]
    [93] Chen B, Zhang J, Tang XB, Chen H, Li CP. JarKA: Modeling attribute interactions for cross-lingual knowledge alignment. In: Proc. of the 24th Pacific-Asia Conf. on Knowledge Discovery and Data Mining. Singapore: Springer. 2020. 845–856.
    [94] Mao X, Wang WT, Xu HM, Wu YB, Lan M. Relational reflection entity alignment. In: Proc. of the 29th ACM Int’l Conf. on Information and Knowledge Management. Ireland: ACM, 2020. 1095–1104.
    [95] Ge CC, Liu XZ, Chen L, Gao YJ, Zheng BH. LargeEA: Aligning entities for large-scale knowledge graphs. Proceedings of the VLDB Endowment, 2021, 15(2): 237–245. [doi: 10.14778/3489496.3489504]
    [96] Zeng WX, Zhao X, Li XY, Tang JY, Wang W. On entity alignment at scale. The VLDB Journal, 2022, 31(5): 1009–1033. [doi: 10.1007/s00778-021-00703-3]
    [97] Sun ZQ, Chen MH, and Hu W. Knowing the no-match: Entity alignment with dangling cases. In: Proc. of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int’l Joint Conf. on Natural Language Processing. ACL, 2021. 3582–3593.
    [98] Luo SX, Yu S. An accurate unsupervised method for joint entity alignment and dangling entity detection. In: Proc. of the 2022 Findings of the Association for Computational Linguistics. Dublin: ACL, 2022. 2330–2339.
    [99] Pei SC, Yu L, Zhang XL. Improving cross-lingual entity alignment via optimal transport. In: Proc. of the 28th Int’l Joint Conf. on Artificial Intelligence. Macao: AAAI Press, 2019. 3231–3237.
    [100] Sun ZQ, Huang JC, Hu W, Chen MH, Guo LB, Qu YZ. TransEdge: Translating relation-contextualized embeddings for knowledge graphs. In: Proc. of the 18th Int’l Semantic Web Conf. Auckland: Springer, 2019. 612–629.
    [101] Shi XF, Xiao YH. Modeling multi-mapping relations for precise cross-lingual entity alignment. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing. Hong Kong: ACL, 2019. 813–822.
    [102] Lin XX, Yang H, Wu J, Zhou C, Wang B. Guiding cross-lingual entity alignment via adversarial knowledge embedding. In: Proc. of the 2019 IEEE Int’l Conf. on Data Mining. Beijing: IEEE, 2019. 429–438.
    [103] Zhu QN, Zhou XF, Wu J, Tan JL, Guo L. Neighborhood-aware attentional representation for multilingual knowledge graphs. In: Proc. of the 28th Int’l Joint Conf. on Artificial Intelligence. Macao: AAAI Press, 2019. 1943–1949.
    [104] Ye R, Li X, Fang YJ, Zang HY, Wang MZ. A vectorized relational graph convolutional network for multi-relational network alignment. In: Proc. of the 28th Int’l Joint Conf. on Artificial Intelligence. Macao: AAAI Press, 2019. 4135–4141.
    [105] Li CJ, Cao YX, Hou L, Shi JX, Li JZ, Chua TS. Semi-supervised entity alignment via joint knowledge embedding model and cross-graph model. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing. Hong Kong: ACL, 2019. 2723–2732.
    [106] Sun ZQ, Wang CM, Hu W, Chen MH, Dai J, Zhang W, Qu YZ. Knowledge graph alignment network with gated multi-hop neighborhood aggregation. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(1): 222–229. [doi: 10.1609/aaai.v34i01.5354]
    [107] Nie H, Han XP, Sun L, Wong CM, Chen Q, Wu SH, Zhang W. Global structure and local semantics-preserved embeddings for entity alignment. In: Proc. of the 29th Int’l Joint Conf. on Artificial Intelligence. Yokohama: IJCAI.org, 2020. 3658–3664.
    [108] Chen J, Li ZX, Zhao PP, Liu A, Zhao L, Chen ZG, Zhang XL. Learning short-term differences and long-term dependencies for entity alignment. In: Proc. of the 19th Int’l Semantic Web Conf. Athens: Springer, 2020. 92–109.
    [109] Yu DH, Yang YM, Zhang RH, Wu YX. Knowledge embedding based graph convolutional network. In: Proc. of the 2021 Web Conf. Ljubljana: ACM, 2021. 1619–1628.
    [110] Zeng WX, Zhao X, Tang JY, Fan CJ. Reinforced active entity alignment. In: Proc. of the 30th ACM Int’l Conf. on Information and Knowledge Management. Queensland: ACM, 2021. 2477–2486.
    [111] Xu CJ, Su FL, Lehmann J. Time-aware graph neural network for entity alignment between temporal knowledge graphs. In: Proc. of the 2021 Conf. on Empirical Methods in Natural Language Processing. Punta Cana: ACL, 2021. 8999–9010.
    [112] Trivedi R, Sisman B, Dong XL, Faloutsos C, Ma J, Zha HY. LinkNBed: Multi-graph representation learning with entity linkage. In: Proc. of the 56th Annual Meeting of the Association for Computational Linguistics. Melbourne: ACL, 2018. 252–262.
    [113] Zhu Q, Wei H, Sisman B, Zheng D, Faloutsos C, Dong XL, Han JW. Collective multi-type entity alignment between knowledge graphs. In: Proc. of the 2020 Web Conf. Taipei: ACM, 2020. 2241–2252.
    [114] Wang ZC, Yang JJ, Ye XJ. Knowledge graph alignment with entity-pair embedding. In: Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing. ACL, 2020. 1672–1680.
    [115] Zhu RB, Ma M, Wang P. RAGA: Relation-aware graph attention networks for global entity alignment. In: Proc. of the 25th Pacific-Asia Conf. on Knowledge Discovery and Data Mining. Switzerland: Springer, 2021. 501–513.
    [116] Xu K, Wang LW, Yu M, Feng YS, Song Y, Wang ZG, Yu D. Cross-lingual knowledge graph alignment via graph matching neural network. In: Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019. 3156–3161.
    [117] Wu YT, Liu X, Feng YS, Wang Z, Yan R, Zhao DY. Relation-aware entity alignment for heterogeneous knowledge graphs. In: Proc. of the 28th Int’l Joint Conf. on Artificial Intelligence. Macao: AAAI Press, 2019. 5278–5284.
    [118] Wu YT, Liu X, Feng YS, Wang Z, Zhao DY. Jointly learning entity and relation representations for entity alignment. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing. Hong Kong: ACL, 2019. 240–249.
    [119] Xu K, Song LF, Feng YS, Song Y, Yu D. Coordinated reasoning for cross-lingual knowledge graph alignment. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(5): 9354–9361. [doi: 10.1609/aaai.v34i05.6476]
    [120] Zeng WX, Zhao X, Tang JY, Lin XM. Collective entity alignment via adaptive features. In: Proc. of the 36th IEEE Int’l Conf. on Data Engineering. Dallas: IEEE, 2020. 1870–1873.
    [121] Zeng WX, Zhao X, Tang JY, Lin XM, Groth P. Reinforcement learning-based collective entity alignment with adaptive features. ACM Trans. on Information Systems, 2021, 39(3): 26.
    [122] Wu YT, Liu X, Feng YS, Wang Z, Zhao DY. Neighborhood matching network for entity alignment. In: Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2020. 6477–6487.
    [123] Yang JZ, Zhou W, Wei LW, Lin JY, Han JZ, Hu SL. RE-GCN: Relation enhanced graph convolutional network for entity alignment in heterogeneous knowledge graphs. In: Proc. of the 25th Int’l Conf. on Database Systems for Advanced Applications. Jeju: Springer, 2020. 432–447.
    [124] Yan YC, Liu LH, Ban YK, Jing BY, Tong HH. Dynamic knowledge graph alignment. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(5): 4564–4572. [doi: 10.1609/aaai.v35i5.16585]
    [125] Mao X, Wang WT, Wu YB, Lan M. Boosting the speed of entity alignment 10×: Dual attention matching network with normalized hard sample mining. In: Proc. of the 2021 Web Conf. Ljubljana: ACM, 2021. 821–832.
    [126] Yang JZ, Wang D, Zhou W, Qian WH, Wang X, Han JZ, Hu SL. Entity and relation matching consensus for entity alignment. In: Proc. of the 30th ACM Int’l Conf. on Information and Knowledge Management. Queensland: ACM, 2021. 2331–2341.
    [127] Liu X, Hong HY, Wang XH, Chen ZY, Kharlamov E, Dong YX, Tang J. SelfKG: Self-supervised entity alignment in knowledge graphs. In: Proc. of the 2022 ACM Web Conf. Lyon: ACM, 2022. 860–870.
    [128] Ge CC, Zeng XC, Chen L, Gao YJ. ZeroMatcher: A cost-off entity matching system. In: Proc. of the 45th Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. Madrid: ACM, 2022, 3262–3266.
    [129] Pasternack J, Roth D. Knowing what to believe (when you already know something). In: Proc. of the 23rd Int’l Conf. on Computational Linguistics. Beijing: ACL, 2010. 877–885.
    [130] Galland A, Abiteboul S, Marian A, Senellart P. Corroborating information from disagreeing views. In: Proc. of the 3rd ACM Int’l Conf. on Web Search and Data Mining. New York: ACM, 2010. 131–140.
    [131] Yin XX, Han JW, Yu PS. Truth discovery with multiple conflicting information providers on the Web. In: Proc. of the 13th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. San Jose: ACM, 2007. 1048–1052.
    [132] Dong XL, Berti-Equille L, Srivastava D. Integrating conflicting data: The role of source dependence. Proceedings of the VLDB Endowment, 2009, 2(1): 550–561. [doi: 10.14778/1687627.1687690]
    [133] Yin XX, Tan WZ. Semi-supervised truth discovery. In: Proc. of the 20th Int’l Conf. on World Wide Web. Hyderabad: ACM, 2011. 217–226.
    [134] Li Q, Li YL, Gao J, Su L, Zhao B, Demirbas M, Fan W, Han JW. A confidence-aware approach for truth discovery on long-tail data. Proceedings of the VLDB Endowment, 2014, 8(4): 425–436. [doi: 10.14778/2735496.2735505]
    [135] Li Q, Li YL, Gao J, Zhao B, Fan W, Han JW. Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation. In: Proc. of the 2014 ACM SIGMOD Int’l Conf. on Management of Data. Snowbird: ACM, 2014. 1187–1198.
    [136] Zhao B, Han JW. A probabilistic model for estimating real-valued truth from conflicting sources. In: Proc. of the 10th Int’l Workshop on Quality in Databases. Istanbul, 2012. 1817.
    [137] Pasternack J, Roth D. Latent credibility analysis. In: Proc. of the 22nd Int’l Conf. on World Wide Web. Rio de Janeiro: ACM, 2013. 1009–1020.
    [138] Dong XL, Berti-Equille L, Srivastava D. Truth discovery and copying detection in a dynamic world. Proceedings of the VLDB Endowment, 2009, 2(1): 562–573. [doi: 10.14778/1687627.1687691]
    [139] Rekatsinas T, Joglekar M, Garcia-Molina H, Parameswaran A, Ré C. SLiMFast: Guaranteed results for data fusion and source reliability. In: Proc. of the 2017 ACM Int’l Conf. on Management of Data. Chicago: ACM, 2017. 1399–1414.
    [140] Li YL, Li Q, Gao J, Su L, Zhao B, Fan W, Han JW. On the discovery of evolving truth. In: Proc. of the 21st ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Sydney: ACM, 2015. 675–684.
    [141] Pochampally R, Das Sarma A, Dong XL, Meliou A, Srivastava D. Fusing data with correlations. In: Proc. of the 2014 ACM SIGMOD Int’l Conf. on Management of Data. Snowbird: ACM, 2014. 433–444.
    [142] Qi GJ, Aggarwal CC, Han JW, Huang T. Mining collective intelligence in diverse groups. In: Proc. of the 22nd Int’l Conf. on World Wide Web. Rio de Janeiro: ACM, 2013. 1041–1052.
    [143] Sarma AD, Dong XL, Halevy A. Data integration with dependent sources. In: Proc. of the 14th Int’l Conf. on Extending Database Technology. Uppsala: ACM, 2011. 401–412.
    [144] Zhao B, Rubinstein BIP, Gemmell J, Han JW. A Bayesian approach to discovering truth from conflicting sources for data integration. Proceedings of the VLDB Endowment, 2012, 5(6): 550–561. [doi: 10.14778/2168651.2168656]
    [145] Li YL, Gao J, Meng CS, Li Q, Su L, Zhao B, Fan W, Han JW. A survey on truth discovery. ACM SIGKDD Explorations Newsletter, 2016, 17(2): 1–16. [doi: 10.1145/2897350.2897352]
    [146] Cao EM, Wang DF, Huang JC, Hu W. Open knowledge enrichment for long-tail entities. In: Proc. of the 2020 Web Conf. Taipei: ACM, 2020. 384–394.
    [147] Huang JC, Zhao Y, Hu W, Ning Z, Chen QJ, Qiu XX, Huo CF, Ren WJ. Trustworthy knowledge graph completion based on multi-sourced noisy data. In: Proc. of the 2022 ACM Web Conf. Lyon: ACM, 2022. 956–965.
    [148] Dong XL, Gabrilovich E, Heitz G, Horn W, Murphy K. From data fusion to knowledge fusion. Proceedings of the VLDB Endowment, 2014, 7(10): 881–892. [doi: 10.14778/2732951.2732962]
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

高云君,葛丛丛,郭宇翔,陈璐.面向关系型数据与知识图谱的数据集成技术综述.软件学报,2023,34(5):2365-2391

Copy
Share
Article Metrics
  • Abstract:2325
  • PDF: 6690
  • HTML: 5954
  • Cited by: 0
History
  • Received:June 21,2022
  • Revised:August 18,2022
  • Online: December 30,2022
  • Published: May 06,2023
You are the first2044956Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063