华谱通: 基于知识推理的家谱问答大语言模型
作者:
中图分类号:

TP182

基金项目:

国家自然科学基金(62120106008); 教育部创新团队发展计划(IRT17R32); 中央高校基本科研业务费专项资金(JZ2023HGTB0270)


Huaputong: Large Language Model for Genealogical Question-answering with Knowledge Reasoning
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    利用计算机技术实现家谱数据的智能化管理, 对传承和普及中华传统文化有着重要的意义. 近年来, 随着基于检索增强的大语言模型在知识问答领域被广泛应用, 通过大语言模型以对话的方式向用户展示多样的家谱文化已经成为一个备受关注的研究方向. 然而, 家谱数据的异构性、自治性、复杂性和演化性导致现有的知识检索框架难以在复杂的家谱信息中实现完备的知识推理. 针对上述问题, 提出一种基于知识图谱推理的大语言模型家谱问答系统——华谱通, 从推理逻辑完备性和信息筛选精准性两个方面, 构建适合大语言模型家谱问答的知识图谱推理框架. 在推理逻辑完备性方面, 以知识图谱作为家谱知识的载体, 并基于Jena框架提出一套完备的家谱知识推理规则, 以提升模型对家谱信息的检索召回率. 在信息筛选方面, 以家谱中的同名人物和多重亲属关系为场景, 提出基于问题-条件三元组的多条件匹配机制和基于大根堆的Dijkstra路径排序算法, 通过过滤冗余的检索信息, 达到对大语言模型精准提示的目的. 目前, 华谱通已经部署到公开的智能家谱网站——华谱网, 并通过真实的家谱数据验证了问答系统的有效性.

    Abstract:

    The use of computer technology for intelligent management of genealogy data plays a significant role in inheriting and popularizing Chinese traditional culture. In recent years, with the widespread application of retrieval-augmented large language model (LLM) in the knowledge question-answering (Q&A) field, presenting diverse genealogy scenarios to users through dialogues with LLMs has become a highly anticipated research direction. However, the heterogeneity, autonomy, complexity, and evolution (HACE) characteristics of genealogy data pose challenges for existing knowledge retrieval frameworks to perform comprehensive knowledge reasoning within complex genealogy information. To address this issue, Huaputong, a genealogy Q&A system based on LLMs with knowledge graph reasoning, is proposed. A knowledge graph reasoning framework, suitable for LLM-based genealogy Q&A, is constructed from two aspects: logic reasoning completeness and information filtering accuracy. In terms of the completeness of logic reasoning, knowledge graphs are used as the medium for genealogy knowledge, and a comprehensive set of genealogy reasoning rules based on the Jena framework is proposed to improve the retrieval recall of genealogy knowledge reasoning. For information filtering, scenarios involving name ambiguity and multiple kinship relations in genealogy are considered. A multi-condition matching mechanism based on problem-condition triples and a Dijkstra path ranking algorithm using a max heap are designed to filter redundant retrieval information, thus ensuring accurate prompting for LLMs. Huaputong has been deployed on the Huapu platform, a publicly available intelligent genealogical website, where its effectiveness has been validated using real-world genealogical data.

    参考文献
    相似文献
    引证文献
引用本文

吴信东,卓兴锐,常永泮,吴共庆,张赞,朱毅.华谱通: 基于知识推理的家谱问答大语言模型.软件学报,,():1-27

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-06-27
  • 最后修改日期:2024-08-26
  • 在线发布日期: 2025-06-25
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号