基于胶囊异构图注意力网络的中文表格型数据事实验证

doi:10.13328/j.cnki.jos.006951

微信服务号

微信订阅号

2025年8月13日 7:04 星期三

首页 > 过刊浏览>2024年第35卷第9期 >4324-4345. DOI:10.13328/j.cnki.jos.006951

PDF HTML阅读 XML下载导出引用引用提醒

基于胶囊异构图注意力网络的中文表格型数据事实验证
DOI:
                        10.13328/j.cnki.jos.006951
                    
CSTR:
                        
                    
作者:
                        杨鹏杨鹏
东南大学 计算机科学与工程学院, 江苏 南京 211189;计算机网络和信息集成教育部重点实验室(东南大学), 江苏 南京 211189
在期刊界中查找
在百度中查找
在本站中查找
查显宇查显宇
东南大学 计算机科学与工程学院, 江苏 南京 211189;计算机网络和信息集成教育部重点实验室(东南大学), 江苏 南京 211189
在期刊界中查找
在百度中查找
在本站中查找
赵广振赵广振
东南大学 计算机科学与工程学院, 江苏 南京 211189;计算机网络和信息集成教育部重点实验室(东南大学), 江苏 南京 211189
在期刊界中查找
在百度中查找
在本站中查找
林茜林茜
福州大学 计算机与大数据学院, 福建 福州 350108
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:杨鹏(1975－), 男, 博士, 教授, 博士生导师, CCF专业会员, 主要研究领域为自然语言处理, 新型网络, 大数据治理;查显宇(1998－), 男, 硕士生, 主要研究领域为自然语言处理, 机器学习. ;赵广振(1992－), 男, 博士, CCF学生会员, 主要研究领域为自然语言处理, 机器学习. ;林茜(1999－), 女, 硕士生, 主要研究领域为自然语言处理, 计算机三维视觉.
通讯作者:杨鹏, E-mail: pengyang@seu.edu.cn
中图分类号:TP18
基金项目:国家自然科学基金(62272100); 中国工程院院地合作项目(JS2021ZT05); 中国工程院咨询项目(2023-XY-09)

Fact Verification with Chinese Tabular Data Based on Capsule Heterogeneous Graph Attention Network

Author:

YANG Peng
YANG Peng
School of Computer Science and Engineering, Southeast University, Nanjing 211189, China;Key Laboratory of Computer Network and Information Integration, Ministry of Education (Southeast University), Nanjing 211189, China
在期刊界中查找
在百度中查找
在本站中查找
ZHA Xian-Yu
ZHA Xian-Yu
School of Computer Science and Engineering, Southeast University, Nanjing 211189, China;Key Laboratory of Computer Network and Information Integration, Ministry of Education (Southeast University), Nanjing 211189, China
在期刊界中查找
在百度中查找
在本站中查找
ZHAO Guang-Zhen
ZHAO Guang-Zhen
School of Computer Science and Engineering, Southeast University, Nanjing 211189, China;Key Laboratory of Computer Network and Information Integration, Ministry of Education (Southeast University), Nanjing 211189, China
在期刊界中查找
在百度中查找
在本站中查找
LIN Xi
LIN Xi
College of Computer and Data Science, Fuzhou University, Fuzhou 350108, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

事实验证旨在检查一个文本陈述是否被给定的证据所支持. 由于表格结构上具有依赖性、内容上具有隐含性, 以表格作为证据的事实验证任务仍面临很多挑战. 现有工作或者利用逻辑表达式来解析基于表格证据的陈述, 或者设计表格感知神经网络来编码陈述-表格对, 以此实现基于表格的事实验证任务. 但是, 这些方法没有充分利用陈述背后隐含的表格信息, 从而导致模型的推理性能下降, 并且基于表格证据的中文陈述具有更加复杂的语法和语义, 也给模型推理带来更大的困难. 为此, 提出基于胶囊异构图注意力网络(CapsHAN)的中文表格型数据事实验证方法, 所提方法能充分理解陈述的结构和语义, 进而挖掘和利用陈述所隐含的表格信息, 有效提升基于表格的事实验证任务准确性. 具体而言, 首先通过对陈述进行依存句法分析和命名实体识别来构建异构图, 接着对该图采用异构图注意力网络和胶囊图神经网络进行学习和理解, 然后将得到的陈述文本表示与经过编码的表格文本表示进行拼接, 最后完成结果的预测. 更进一步, 针对现有中文表格型事实验证数据集匮乏而难以支持基于表格的事实验证方法性能评价的难题, 首先对主流TABFACT和INFOTABS表格事实验证英文数据集进行中文转化, 并且专门针对中文表格型数据的特点构建了基于UCL国家标准的数据集UCLDS, 该数据集将维基百科信息框作为人工注释的自然语言陈述的证据, 并被标记为蕴含、反驳或中立3类. UCLDS在同时支持单表和多表推理方面比传统TABFACT和INFOTABS数据集更胜一筹. 在上述3个中文基准数据集上的实验结果表明, 所提模型的表现均优于基线模型, 证明该模型在基于中文表格的事实验证任务上的优越性.

关键词:基于表格的事实验证;异构图注意力网络;胶囊图神经网络;依存句法分析;命名实体识别

Abstract:

Fact verification is intended to check whether a textual statement is supported by a given piece of evidence. Due to the structural dependence and implicit content of tables, the task of fact verification with tables as the evidence still faces many challenges. Existing literature has either used logical expressions to parse statements based on tabular evidence or designed table-aware neural networks to encode statement-table pairs and thereby accomplish table-based fact verification tasks. However, these approaches fail to fully utilize the implicit tabular information behind the statements, which leads to the degraded inference performance of the model. Moreover, Chinese statements based on tabular evidence have more complex syntax and semantics, which also adds to the difficulties in model inference. For this reason, the study proposes a method of fact verification with Chinese tabular data based on the capsule heterogeneous graph attention network (CapsHAN). This method can fully understand the structure and semantics of statements. On this basis, the tabular information implied by the statements is mined and utilized to effectively improve the accuracy of table-based fact verification tasks. Specifically, a heterogeneous graph is constructed by performing syntactic dependency parsing and named entity recognition of statements. Subsequently, the graph is learned and understood by the heterogeneous graph attention network and the capsule graph neural network, and the obtained textual representation of the statements is sliced together with the textual representation of the encoded tables. Finally, the result is predicted. Further, this study also attempts to address the problem that the datasets of fact verification based on Chinese tables are scarce and thus unable to support the performance evaluation of table-based fact verification methods. For this purpose, the study transforms the mainstream English table-based fact verification datasets TABFACT and INFOTABS into Chinese and constructs a dataset that is based on the uniform content label (UCL) national standard and specifically tailored to the characteristics of Chinese tabular data. This dataset, namely, UCLDS, takes Wikipedia infoboxes as evidence of manually annotated natural language statements and labels them into three classes: entailed, contradictory, and neutral. UCLDS outperforms the traditional datasets TABFACT and INFOTABS in supporting both single-table and multi-table inference. The experimental results on the above three Chinese benchmark datasets show that the proposed model outperforms the baseline model invariably, demonstrating its superiority for Chinese table-based fact verification tasks.

Key words:table-based fact verification;heterogeneous graph attention network (HAN);capsule graph neural network (CapsGNN);dependency parsing;named entity recognition

引用本文

杨鹏,查显宇,赵广振,林茜.基于胶囊异构图注意力网络的中文表格型数据事实验证.软件学报,2024,35(9):4324-4345

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-10-24
最后修改日期:2023-03-06
录用日期:
在线发布日期: 2023-08-23
出版日期: 2024-09-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码