Interpretability of Entity Matching Based on Pre-trained Language Model

doi:10.13328/j.cnki.jos.006794

微信服务号

微信订阅号

2025-6-2- 19

Home > Archive>Volume 34, Issue 3, 2023 >1087-1108. DOI:10.13328/j.cnki.jos.006794

PDF HTML XML Export Cite reminder

Interpretability of Entity Matching Based on Pre-trained Language Model
DOI:
                        10.13328/j.cnki.jos.006794
                    
Author:
                        LIANG ZhengLIANG Zheng
Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG Hong-ZhiWANG Hong-Zhi
Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
DAI Jia-JiaDAI Jia-Jia
Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
SHAO Xin-YueSHAO Xin-Yue
Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
DING Xiao-OuDING Xiao-Ou
Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
MU Tian-YuMU Tian-Yu
Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Entity matching can determine whether records in two datasets point to the same real-world entity, and is indispensable for tasks such as big data integration, social network analysis, and web semantic data management. As a deep learning technology that has achieved a lot of success in natural language processing and computer vision, pre-trained language models have also achieved better results than traditional methods in entity matching tasks, which have attracted the attention of a large number of researchers. However, the performance of entity matching based on pre-trained language model is unstable and the matching results cannot be explained, which brings great uncertainty to the application of this technology in big data integration. At the same time, the existing entity matching model interpretation methods are mainly oriented to machine learning methods as model-agnostic interpretation, and there are shortcomings in their applicability on pre-trained language models. Therefore, this study takes BERT entity matching models such as Ditto and JointBERT as examples, and proposes three model interpretation methods for pre-training language model entity matching technology to solve this problem. (1) In the serialization operation, the order of relational data attributes is sensitive. Dataset meta-features and attribute similarity are used to generate attribute ranking counterfactuals for misclassified samples; (2) As a supplement to traditional attribute importance measurement, the pre-trained language model attention weights are used to measure and visualize model processing; (3) Based on the serialized sentence vector, the k-nearest neighbor search technique is used to recall the samples with good interpretability similar to the misclassified samples to enhance the low-confidence prediction results of pre-trained language model. Experiments on real public datasets show that while improving the model effect through the enhancing method, the proposed method can reach 68.8% of the upper limit of fidelity in the attribute order search space, which provides a decision explanation for the pre-trained language entity matching model. New perspectives such as attribute order counterfactual and attribute association understanding are also introduced.

Key words:entity matching;pre-trained language model;interpretability

Get Citation

梁峥,王宏志,戴加佳,邵心玥,丁小欧,穆添愉.预训练语言模型实体匹配的可解释性.软件学报,2023,34(3):1087-1108

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:May 16,2022
Revised:July 29,2022
Adopted:
Online: October 26,2022
Published: March 06,2023

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History