基于大语言模型的事件常识知识图谱扩展方法
作者:
中图分类号:

TP18

基金项目:

国家重点研发计划(2022YFC3302300); 国家科技重大专项(7090201050307); 国家242信息安全计划(2023A105)


Method for Expanding Event Commonsense Knowledge Graph Based on Large Language Models
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    常识知识通常不在自然语言中明确表述, 而是隐含在人类的认知中, 为机器提供常识知识一直是人工智能领域的长期目标之一. 前期, 课题组成员手工构建了一个高精度的以事件为中心的中文种子常识知识图谱(ECKG), 包含了26 606个常识事件三元组, 覆盖了因果、时序、条件等多种常见的事件关系. 尽管种子ECKG具有一定的价值, 但规模较小, 在实际应用中发挥的作用有限, 且大规模的事件常识知识图谱在现有研究中较为稀缺. 为了应对这些挑战, 采用GPT系列的大语言模型来扩展种子ECKG中的因果、时序、条件和子事件这4种事件关系. 扩展方法包括3个主要的步骤: 首先, 将种子ECKG中的事件结合4种关系定义设计了特定的事件知识提示(ek-prompt), 并使用GPT-4-Turbo模型生成相应的事件三元组. 其次, 将种子ECKG的三元组与通过ek-prompt获取的正确三元组组合, 以构建特定的数据集, 并将GPT-3.5-Turbo模型在数据集上进行微调, 以生成更多具体的事件三元组和验证新三元组准确性. 最后, 通过分析种子ECKG事件的相似性, 并引入事件共享机制, 将相同关系下的相似事件关联的事件互相共享, 以保持相似事件的三元组一致性. 经过实验评估, 新获取的三元组具有高质量, 尤其是时序关系的三元组准确率最高, 达到了98.2%. 所提扩展方法最终为种子ECKG增补了2 433 012个常识事件三元组, 显著扩大了ECKG的规模, 可以为人工智能领域的许多应用提供了更为丰富的常识知识资源.

    Abstract:

    Commonsense knowledge is usually not explicitly expressed in natural languages but is implicitly understood in human cognition. Providing machines with commonsense knowledge has been a longstanding aim in artificial intelligence. Initially, this study manually constructs a high-precision, event-centric commonsense knowledge graph (ECKG) for seed events in Chinese. It contains 26 606 commonsense event triples encompassing causal, temporal, conditional, and other common event relationships. Although the constructed ECKG holds considerable value, its limited scale curtails practical applications. Besides, large-scale event commonsense knowledge graphs are rare in current studies. To overcome these challenges, this paper uses large language models from the GPT series to expand the above-mentioned three event relationships and sub-events of the proposed ECKG. The expansion method involves three primary steps. Firstly, specific prompts for event knowledge (ek-prompts) are designed by combining the events in the ECKG with four relationships, and GPT-4-Turbo is used to generate corresponding event triples. Secondly, the triples of the ECKG are integrated with accurate triples obtained by ek-prompts to create a specialized dataset. Additionally, GPT-3.5-Turbo is fine-tuned on the dataset to generate more specific event triples and validate the accuracy of new triples. Lastly, by analyzing the similarities among events in the ECKG and implementing an event-sharing mechanism, similar events within the same relationship are interconnected, ensuring consistency across similar event triples. Experimental results show that the newly acquired triples are of high quality, particularly those of the temporal relationships, with an accuracy rate of 98.2%. Ultimately, the proposed expansion method appends 2 433 012 commonsense event triples to the original ECKG, significantly expanding its scale and providing more commonsense knowledge for many applications in artificial intelligence.

    参考文献
    相似文献
    引证文献
引用本文

黄俏娟,曹存根,王亚,王石.基于大语言模型的事件常识知识图谱扩展方法.软件学报,,():1-34

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-04-19
  • 最后修改日期:2024-06-10
  • 在线发布日期: 2024-12-31
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号