综合实体语义和本体信息的多源中文医疗知识图谱实体对齐

doi:10.13328/j.cnki.jos.007370

微信服务号

微信订阅号

2025年7月19日 8:02 星期六

首页 > 过刊浏览>年第卷第期 >1-19. DOI:10.13328/j.cnki.jos.007370

PDF HTML阅读 XML下载导出引用引用提醒

综合实体语义和本体信息的多源中文医疗知识图谱实体对齐
DOI:
                        10.13328/j.cnki.jos.007370
                    
CSTR:
                        
                    
作者:
                        丁瑞卿丁瑞卿
高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 计算机学院, 北京 100871
在期刊界中查找
在百度中查找
在本站中查找
赵俊峰赵俊峰
高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 计算机学院, 北京 100871
在期刊界中查找
在百度中查找
在本站中查找
王乐业王乐业
高可信软件技术教育部重点实验室(北京大学), 北京 100871;北京大学 计算机学院, 北京 100871
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP18
基金项目:国家自然科学基金(U23A20468, 62133004, 72188101)

?Multi-source Chinese Medical Knowledge Graph Entity ?Alignment via Entity Semantics and Ontology Information

Author:

DING Rui-Qing
DING Rui-Qing
Key Lab of High Confidence Software Technologies (Peking University), Ministry of Education, Beijing 100871, China;School of Computer Science, Peking University, Beijing 100871, China
在期刊界中查找
在百度中查找
在本站中查找
ZHAO Jun-Feng
ZHAO Jun-Feng
Key Lab of High Confidence Software Technologies (Peking University), Ministry of Education, Beijing 100871, China;School of Computer Science, Peking University, Beijing 100871, China
在期刊界中查找
在百度中查找
在本站中查找
WANG Le-Ye
WANG Le-Ye
Key Lab of High Confidence Software Technologies (Peking University), Ministry of Education, Beijing 100871, China;School of Computer Science, Peking University, Beijing 100871, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [51]

相似文献

引证文献

资源附件

文章评论

摘要:

知识图谱作为结构化的知识表示形式, 在医疗领域具有广泛应用. 实体对齐, 即识别不同图谱中的等价实体, 是构建大规模知识图谱的基础步骤. 尽管已有大量研究关注此问题, 但主要集中在两个图谱的对齐任务上, 一般通过捕捉实体语义和图谱结构信息生成实体的向量表示, 之后计算向量相似度以确定等价实体. 在发现多源图谱对齐过程中存在对齐错误传递的问题的基础上, 考虑到医疗场景对实体对齐的准确性要求较高, 设计综合实体语义和本体信息的多源中文医疗知识图谱实体对齐方法(MSOI-Align). 该方法首先将多个图谱进行两两组合, 利用表示学习生成实体向量表示, 并且综合实体名称的相似度和本体一致性约束, 借助大语言模型筛选得到候选实体集合. 随后, 基于三元闭包理论结合大语言模型对候选实体集合进行自动化的对齐错误传递识别与纠正. 在4个中文医疗知识图谱上的实验结果表明, MSOI-Align方法显著提升了实体对齐任务的精确性, 与最优的基准方法相比, Hits@1指标从0.42提升至0.92. 融合后的知识图谱CMKG包含13类本体、19万实体和约70万三元组. 考虑到版权限制, 开源了受限图谱外的另外3个图谱融合的结果——OpenCMKG.

关键词:中文医疗知识图谱;多源知识图谱对齐;大语言模型应用;本体信息;实体语义;对齐错误传递

Abstract:

Knowledge graph (KG), as structured representations of knowledge, has a wide range of applications in the medical field. Entity alignment, which involves identifying equivalent entities across different KGs, is a fundamental step in constructing large-scale KGs. Although extensive research has focused on this issue, most of it has concentrated on aligning pairs of KGs, typically by capturing the semantic and structural information of entities to generate embeddings, followed by calculating embedding similarity to identify equivalent entities. This study identifies the problem of alignment error propagation when aligning multiple KGs. Given the high accuracy requirements for entity alignment in medical contexts, we propose a multi-source Chinese medical knowledge graph entity alignment method (MSOI-Align) that integrates entity semantics and ontology information. Our method pairs multiple KGs and uses representation learning to generate entity embeddings. It also incorporates both the similarity of entity names and ontology consistency constraints, leveraging a large language model to filter a set of candidate entities. Subsequently, based on triadic closure theory and the large language model, MSOI-Align automatically identifies and corrects the propagation of alignment errors for the candidate entities. Experimental results on four Chinese medical knowledge graphs show that MSOI-Align significantly enhances the precision of the entity alignment task, with the Hits@1 metric increasing from 0.42 to 0.92 compared to the state-of-the-art baseline. The fused knowledge graph, CMKG, contains 13 types of ontologies, 190000 entities, and approximately 700000 triplets. Due to copyright restrictions on one of the KGs, we are releasing the fusion of the other three KGs, named OpenCMKG.

Key words:Chinese medical knowledge graph;multi-source knowledge graph entity alignment;large language model (LLM) application;ontology information;entity semantics;alignment error propagation

参考文献

[1] Paulheim H. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web, 2017, 8(3): 489–508.

[2] Guo QY, Zhuang FZ, Qin C, Zhu HS, Xie X, Xiong H, He Q. A survey on knowledge graph-based recommender systems. IEEE Trans. on Knowledge and Data Engineering, 2022, 34(8): 3549–3568.

[3] Li FL, Chen HH, Xu GH, Qiu T, Ji F, Zhang J, Chen HQ. AliMeKG: Domain knowledge graph construction and application in e-commerce. In: Proc. of the 29th ACM Int’l Conf. on Information & Knowledge Management. ACM, 2020. 2581–2588. [doi: 10.1145/3340531.3412685]

[4] Lewis P, Perez E, Piktus A, Petroni F, Karpukhin V, Goyal N, Küttler H, Lewis M, Yih WT, Rockt?schel T, Riedel S, Kiela D. Retrieval-augmented generation for knowledge-intensive NLP tasks. In: Proc. of the 34th Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2020. 793.

[5] Yang ZL, Qi P, Zhang SZ, Bengio Y, Cohen W, Salakhutdinov R, Manning CD. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In: Proc. of the 2018 Conf. on Empirical Methods in Natural Language Processing. Brussels: ACL, 2018. 2369–2380. [doi: 10.18653/v1/D18-1259]

[6] Elhammadi S, Lakshmanan LVS, Ng R, Simpson M, Huai BX, Wang ZF, Wang LJ. A high precision pipeline for financial knowledge graph construction. In: Proc. of the 28th Int’l Conf. on Computational Linguistics. Barcelona: ACL, 2020. 967–977. [doi: 10.18653/v1/2020.coling-main.84]

[7] Tang J, Zhang J, Yao LM, Li JZ, Zhang L, Su Z. ArnetMiner: Extraction and mining of academic social networks. In: Proc. of the 14th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. Las Vegas: ACM, 2008. 990–998. [doi: 10.1145/1401890.140200]

[8] Bodenreider O. The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Research, 2004, 32(S1): D267–D270.

[9] Zhang ZH, Liu HL, Chen JY, Chen X, Liu B, Xiang YJ, Zheng YF. An industry evaluation of embedding-based entity alignment. In: Proc. of the 28th Int’l Conf. on Computational Linguistics: Industry Track. ACL, 2020. 179–189. [doi: 10.18653/v1/2020.coling-industry.17]

[10] F?rber M, Bartscherer F, Menne C, Rettinger A. Linked data quality of dbpedia, freebase, OpenCyc, wikidata, and YAGO. Semantic Web, 2018, 9(1): 77–129.

[11] Demartini G. Implicit bias in crowdsourced knowledge graphs. In: Proc. of the 2019 World Wide Web Conf. San Francisco: ACM, 2019. 624–630. [doi: 10.1145/3308560.3317307]

[12] Yu S, Yuan Z, Xia J, Luo SX, Ying HY, Zeng SH, Ren JY, Yuan HY, Zhao ZY, Lin YC, Lu KM, Wang J, Xie YT, Shum HY. BIOS: An algorithmically generated biomedical knowledge graph. arXiv:2203.09975, 2022.

[13] 闫璟辉, 宗成庆, 徐金安. 中文医疗文本中的嵌套实体识别方法. 软件学报, 2024, 35(6): 2923–2935. http://www.jos.org.cn/1000-9825/6927.htm

Yan JH, Zong CQ, Xu JA. Nested entity recognition approach in Chinese medical text. Ruan Jian Xue Bao/Journal of Software, 2024, 35(6): 2923–2935 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6927.htm

[14] 杨玉基, 许斌, 胡家威, 仝美涵, 张鹏, 郑莉. 一种准确而高效的领域知识图谱构建方法. 软件学报, 2018, 29(10): 2931–2947. http://www.jos.org.cn/1000-9825/5552.htm

Yang YJ, Xu B, Hu JW, Tong MH, Zhang P, Zheng L. Accurate and efficient method for constructing domain knowledge graph. Ruan Jian Xue Bao/Journal of Software, 2018, 29(10): 2931–2947 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5552.htm

[15] Sun ZQ, Zhang QH, Hu W, Wang CM, Chen MH, Akrami F, Li CK. A benchmarking study of embedding-based entity alignment for knowledge graphs. Proc. of the VLDB Endowment, 2020, 13(12): 2326–2340.

[16] 张天成, 田雪, 孙相会, 于明鹤, 孙艳红, 于戈. 知识图谱嵌入技术研究综述. 软件学报, 2023, 34(1): 277–311. http://www.jos.org.cn/1000-9825/6429.htm

Zhang TC, Tian X, Sun XH, Yu MH, Sun YH, Yu G. Overview on knowledge graph embedding technology research. Ruan Jian Xue Bao/Journal of Software, 2023, 34(1): 277–311 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6429.htm

[17] Huang H, Dong YX, Tang J, Yang HX, Chawla NV, Fu XM. Will triadic closure strengthen ties in social networks? ACM Trans. on Knowledge Discovery from Data, 2018, 12(3): 30. [doi: 10.1145/3154399]

[18] Ding RQ, Han X, Wang LY. A unified knowledge graph augmentation service for boosting domain-specific NLP tasks. In: Proc. of the 62nd Annual Meeting of the Association for Computational Linguistics. Toronto: ACL, 2023. 353–369. [doi: 10.18653/v1/2023.findings-acl.24]

[19] Zeng KS, Li CJ, Hou L, Li JZ, Feng L. A comprehensive survey of entity alignment for knowledge graphs. AI Open, 2021, 2: 1–13.

[20] Tang J, Li JZ, Liang BY, Huang XT, Li Y, Wang KH. Using Bayesian decision for ontology mapping. Journal of Web Semantics, 2006, 4(4): 243–262.

[21] Jiménez-Ruiz E, Cuenca Grau B. LogMap: Logic-based and scalable ontology matching. In: Proc. of the 10th Int’l Semantic Web Conf. on the Semantic Web. Bonn: Springer, 2011. 273–288. [doi: 10.1007/978-3-642-25073-6_18]

[22] Suchanek FM, Abiteboul S, Senellart P. PARIS: Probabilistic alignment of relations, instances, and schema. Proc. of the VLDB Endowment, 2011, 5(3): 157–168.

[23] Bordes A, Usunier N, Garcia-Durán A, Weston J, Yakhnenko O. Translating embeddings for modeling multi-relational data. In: Proc. of the 27th Int’l Conf. on Neural Information Processing Systems. Lake Tahoe: Curran Associates Inc., 2013. 2787–2795.

[24] Chen MH, Tian YT, Yang MH, Zaniolo C. Multilingual knowledge graph embeddings for cross-lingual knowledge alignment. In: Proc. of the 26th Int’l Joint Conf. on Artificial Intelligence. Melbourne: AAAI Press, 2017. 1511–1517.

[25] Sun ZQ, Hu W, Li CK. Cross-lingual entity alignment via joint attribute-preserving embedding. In: Proc. of the 16th Int’l Semantic Web Conf. on the Semantic Web. Vienna: Springer, 2017. 628–644. [doi: 10.1007/978-3-319-68288-4_37]

[26] Sun ZQ, Hu W, Zhang QH, Qu YZ. Bootstrapping entity alignment with knowledge graph embedding. In: Proc. of the 27th Int’l Joint Conf. on Artificial Intelligence. Stockholm: AAAI Press, 2018. 4396–4402.

[27] Wang ZC, Lv QS, Lan XH, Zhang Y. Cross-lingual knowledge graph alignment via graph convolutional networks. In: Proc. of the 2018 Conf. on Empirical Methods in Natural Language Processing. Brussels: ACL, 2018. 349–357. [doi: 10.18653/v1/D18-1032]

[28] Xu K, Wang LW, Yu M, Feng YS, Song Y, Wang ZG, Yu D. Cross-lingual knowledge graph alignment via graph matching neural network. In: Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019. 3156–3161. [doi: 10.18653/v1/P19-1304]

[29] Sun ZQ, Huang JC, Hu W, Chen MH, Guo LB, Qu YZ. TransEdge: Translating relation-contextualized embeddings for knowledge graphs. In: Proc. of the 18th Int’l Semantic Web Conf. on the Semantic Web. Auckland: Springer, 2019. 612–629. [doi: 10.1007/978-3-030-30793-6_35]

[30] Zhang QH, Sun ZQ, Hu W, Chen MH, Guo LB, Qu YZ. Multi-view knowledge graph embedding for entity alignment. In: Proc. of the 28th Int’l Joint Conf. on Artificial Intelligence. Macao: IJCAI, 2019. 5429–5435. [doi: 10.24963/ijcai.2019/754]

[31] Qi ZY, Zhang ZH, Chen JY, Chen X, Xiang YJ, Zhang NY, Zheng YF. Unsupervised knowledge graph alignment by probabilistic reasoning and semantic embedding. In: Proc. of the 30th Int’l Joint Conf. on Artificial Intelligence. Montreal: IJCAI, 2021. 2019–2025. [doi: 10.24963/ijcai.2021/278]

[32] Liu X, Hong HY, Wang XH, Chen ZY, Kharlamov E, Dong YX, Tang J. SelfKG: Self-supervised entity alignment in knowledge graphs. In: Proc. of the 2022 ACM Web Conf. Lyon: ACM, 2022. 860–870. [doi: 10.1145/3485447.3511945]

[33] 孙泽群, 崔员宁, 胡伟. 基于链接实体回放的多源知识图谱终身表示学习. 软件学报, 2023, 34(10): 4501–4517. http://www.jos.org.cn/1000-9825/6887.htm

Sun ZQ, Cui YN, Hu W. Lifelong representation learning of multi-sourced knowledge graphs via linked entity replay. Ruan Jian Xue Bao/Journal of Software, 2023, 34(10): 4501–4517 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6887.htm

[34] 张金斗, 李京. 一种结合层次化类别信息的知识图谱表示学习方法. 软件学报, 2022, 33(9): 3331–3346. http://www.jos.org.cn/1000-9825/6295.htm

Zhang JD, Li J. Knowledge graph embedding combining with hierarchical type information. Ruan Jian Xue Bao/Journal of Software, 2022, 33(9): 3331–3346 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6295.htm

[35] Xiang YJ, Zhang ZH, Chen JY, Chen X, Lin ZX, Zheng YF. OntoEA: Ontology-guided entity alignment via joint knowledge graph embedding. In: Proc. of the 60th Annual Meeting of the Association for Computational Linguistics. ACL, 2021. 1117–1128. [doi: 10.18653/v1/2021.findings-acl.96]

[36] Petroni F, Rockt?schel T, Riedel S, Lewis P, Bakhtin A, Wu YX, Miller A. Language models as knowledge bases? In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing (EMNLP-IJCNLP). Hong Kong: ACL, 2019. 2463–2473. [doi: 10.18653/v1/D19-1250]

[37] Ziems N, Yu WH, Zhang ZH, Jiang M. Large language models are built-in autoregressive search engines. In: Proc. of the 62nd Annual Meeting of the Association for Computational Linguistics. Toronto: ACL, 2023. 2666–2678. [doi: 10.18653/v1/2023.findings-acl.167]

[38] Bosselut A, Rashkin H, Sap M, Malaviya C, Celikyilmaz A, Choi Y. COMET: Commonsense Transformers for automatic knowledge graph construction. In: Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019. 4762–4779. [doi: 10.18653/v1/P19-1470]

[39] West P, Bhagavatula C, Hessel J, Hwang J, Jiang LW, Le Bras R, Lu XM, Welleck S, Choi Y. Symbolic knowledge distillation: From general language models to commonsense models. In: Proc. of the 2022 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Seattle: ACL, 2022. 4602–4625. [doi: 10.18653/v1/2022.naacl-main.341]

[40] Choi B, Ko Y. Knowledge graph extension with a pre-trained language model via unified learning method. Knowledge-based Systems, 2023, 262: 110245.

[41] Zhao XD, Ouyang SQ, Yu ZG, Wu M, Li L. Pre-trained language models can be fully zero-shot learners. In: Proc. of the 61st Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers). Toronto: ACL, 2023. 15590–15606. [doi: 10.18653/v1/2023.acl-long.869]

[42] Xu CW, Xu YC, Wang SH, Liu Y, Zhu CG, McAuley J. Small models are valuable plug-ins for large language models. In: Proc. of the 63rd Annual Meeting of the Association for Computational Linguistics. Bangkok: ACL, 2024. 283–294. [doi: 10.18653/v1/2024.findings-acl.18]

[43] Wang CX, Huang ZH, Wan Y, Wei JY, Zhao JZ, Wang PH. FuAlign: Cross-lingual entity alignment via multi-view representation learning of fused knowledge graphs. Information Fusion, 2023, 89: 41–52.

[44] Pan SR, Luo LH, Wang YF, Chen C, Wang JP, Wu XD. Unifying large language models and knowledge graphs: A roadmap. IEEE Trans. on Knowledge and Data Engineering, 2024, 36(7): 3580–3599.

[45] Ji ZW, Lee N, Frieske R, Yu TZ, Su D, Xu Y, Ishii E, Bang YJ, Madotto A, Fung P. Survey of hallucination in natural language generation. ACM Computing Surveys, 2023, 55(12): 248.

[46] Kandpal N, Deng HK, Roberts A, Wallace E, Raffel C. Large language models struggle to learn long-tail knowledge. In: Proc. of the 40th Int’l Conf. on Machine Learning. Honolulu: JMLR.org, 2023. 641.

引用本文

丁瑞卿,赵俊峰,王乐业.综合实体语义和本体信息的多源中文医疗知识图谱实体对齐.软件学报,,():1-19

复制

文章指标

点击次数:125
下载次数: 282
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2023-12-29
最后修改日期:2024-05-03
录用日期:
在线发布日期: 2025-04-25
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码