综合实体语义和本体信息的多源中文医疗知识图谱实体对齐
作者:
中图分类号:

TP18

基金项目:

国家自然科学基金(U23A20468, 62133004, 72188101)


?Multi-source Chinese Medical Knowledge Graph Entity ?Alignment via Entity Semantics and Ontology Information
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    知识图谱作为结构化的知识表示形式, 在医疗领域具有广泛应用. 实体对齐, 即识别不同图谱中的等价实体, 是构建大规模知识图谱的基础步骤. 尽管已有大量研究关注此问题, 但主要集中在两个图谱的对齐任务上, 一般通过捕捉实体语义和图谱结构信息生成实体的向量表示, 之后计算向量相似度以确定等价实体. 在发现多源图谱对齐过程中存在对齐错误传递的问题的基础上, 考虑到医疗场景对实体对齐的准确性要求较高, 设计综合实体语义和本体信息的多源中文医疗知识图谱实体对齐方法(MSOI-Align). 该方法首先将多个图谱进行两两组合, 利用表示学习生成实体向量表示, 并且综合实体名称的相似度和本体一致性约束, 借助大语言模型筛选得到候选实体集合. 随后, 基于三元闭包理论结合大语言模型对候选实体集合进行自动化的对齐错误传递识别与纠正. 在4个中文医疗知识图谱上的实验结果表明, MSOI-Align方法显著提升了实体对齐任务的精确性, 与最优的基准方法相比, Hits@1指标从0.42提升至0.92. 融合后的知识图谱CMKG包含13类本体、19万实体和约70万三元组. 考虑到版权限制, 开源了受限图谱外的另外3个图谱融合的结果——OpenCMKG.

    Abstract:

    Knowledge graph (KG), as structured representations of knowledge, has a wide range of applications in the medical field. Entity alignment, which involves identifying equivalent entities across different KGs, is a fundamental step in constructing large-scale KGs. Although extensive research has focused on this issue, most of it has concentrated on aligning pairs of KGs, typically by capturing the semantic and structural information of entities to generate embeddings, followed by calculating embedding similarity to identify equivalent entities. This study identifies the problem of alignment error propagation when aligning multiple KGs. Given the high accuracy requirements for entity alignment in medical contexts, we propose a multi-source Chinese medical knowledge graph entity alignment method (MSOI-Align) that integrates entity semantics and ontology information. Our method pairs multiple KGs and uses representation learning to generate entity embeddings. It also incorporates both the similarity of entity names and ontology consistency constraints, leveraging a large language model to filter a set of candidate entities. Subsequently, based on triadic closure theory and the large language model, MSOI-Align automatically identifies and corrects the propagation of alignment errors for the candidate entities. Experimental results on four Chinese medical knowledge graphs show that MSOI-Align significantly enhances the precision of the entity alignment task, with the Hits@1 metric increasing from 0.42 to 0.92 compared to the state-of-the-art baseline. The fused knowledge graph, CMKG, contains 13 types of ontologies, 190000 entities, and approximately 700000 triplets. Due to copyright restrictions on one of the KGs, we are releasing the fusion of the other three KGs, named OpenCMKG.

    参考文献
    相似文献
    引证文献
引用本文

丁瑞卿,赵俊峰,王乐业.综合实体语义和本体信息的多源中文医疗知识图谱实体对齐.软件学报,,():1-19

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2023-12-29
  • 最后修改日期:2024-05-03
  • 在线发布日期: 2025-04-25
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号