中文医疗文本中的嵌套实体识别方法

doi:10.13328/j.cnki.jos.006927

微信服务号

微信订阅号

2025年8月11日 5:59 星期一

首页 > 过刊浏览>2024年第35卷第6期 >2923-2935. DOI:10.13328/j.cnki.jos.006927

PDF HTML阅读 XML下载导出引用引用提醒

中文医疗文本中的嵌套实体识别方法
DOI:
                        10.13328/j.cnki.jos.006927
                    
CSTR:
                        
                    
作者:
                        闫璟辉闫璟辉
北京交通大学 计算机与信息工程学院, 北京 100091
在期刊界中查找
在百度中查找
在本站中查找
宗成庆宗成庆
北京交通大学 计算机与信息工程学院, 北京 100091;模式识别国家重点研究室 (中国科学院 自动化研究所), 北京 100190
在期刊界中查找
在百度中查找
在本站中查找
徐金安徐金安
北京交通大学 计算机与信息工程学院, 北京 100091
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:闫璟辉(1992－), 男, 博士, 主要研究领域为知识抽取, 自然语言处理.
宗成庆(1963－), 男, 博士, 研究员, 博士生导师, CCF会士, 主要研究领域为机器翻译, 自然语言处理.
徐金安(1970－), 男, 博士, 教授, 博士生导师, CCF杰出会员, 主要研究领域为机器翻译, 自然语言处理, 知识图谱及其应用
通讯作者:宗成庆, E-mail: cqzong@nlpr.ia.ac.cn
中图分类号:TP18
基金项目:

Nested Entity Recognition Approach in Chinese Medical Text

Author:

YAN Jing-Hui
YAN Jing-Hui
School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100091, China
在期刊界中查找
在百度中查找
在本站中查找
ZONG Cheng-Qing
ZONG Cheng-Qing
School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100091, China;National Laboratory of Pattern Recognition (Institute of Automation, Chinese Academy of Sciences), Beijing 100190, China
在期刊界中查找
在百度中查找
在本站中查找
XU Jin-An
XU Jin-An
School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100091, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [35]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

实体识别是信息抽取的关键技术. 相较于普通文本, 中文医疗文本的实体识别任务往往面对大量的嵌套实体. 以往识别实体的方法往往忽视了医疗文本本身所特有的实体嵌套规则而直接采用序列标注方法, 为此, 提出一种融合实体嵌套规则的中文实体识别方法. 所提方法在训练过程中将实体的识别任务转化为实体的边界识别与边界首尾关系识别的联合训练任务, 在解码过程中结合从实际医疗文本中所总结出来的实体嵌套规则对解码结果进行过滤, 从而使得识别结果能够符合实际文本中内外层实体嵌套组合的组成规律. 在公开的医疗文本实体识别的实验上取得良好的效果. 数据集上的实验表明, 所提方法在嵌套类型实体识别性能上显著优于已有的方法, 在整体准确率方面比最先进的方法提高0.5%.

关键词:实体识别;中文文本;医疗领域;嵌套实体识别;边界识别

Abstract:

Entity recognition is a key technology for information extraction. Compared with ordinary text, the entity recognition of Chinese medical text is often faced with a large number of nested entities. Previous methods of entity recognition often ignore the entity nesting rules unique to medical text and directly use sequence annotation methods. Therefore, a Chinese entity recognition method that incorporates entity nesting rules is proposed. This method transforms the entity recognition task into a joint training task of entity boundary recognition and boundary first-tail relationship recognition in the training process and filters the results by combining the entity nesting rules summarized from actual medical text in the decoding process. In this way, the recognition results are in line with the composition law of the nested combinations of inner and outer entities in the actual text. Good results have been achieved in public experiments on entity recognition of medical text. Experiments on the dataset show that the proposed method is significantly superior to the existing methods in terms of nested-type entity recognition performance, and the overall accuracy is increased by 0.5% compared with the state-of-the-art methods.

Key words:entity recognition;Chinese text;medical field;nested entity recognition;boundary detection

参考文献

[1] Cowie MR, Blomster JI, Curtis LH, Duclaux S, Ford I, Fritz F, Goldman S, Janmohamed S, Kreuzer J, Leenay M, Michel A, Ong S, Pell JP, Southworth MR, Stough WG, Thoenes M, Zannad F, Zalewski A. Electronic health records to facilitate clinical research. Clinical Research in Cardiology, 2017, 106(1): 1–9. [doi: 10.1007/s00392-016-1025-6]

[2] Denaxas SC, Morley KI. Big biomedical data and cardiovascular disease research: Opportunities and challenges. European Heart Journal-Quality of Care and Clinical Outcomes, 2015, 1(1): 9–16. [doi: 10.1093/ehjqcco/qcv005]

[3] Li I, Pan J, Goldwasser J, Verma N, Wong WP, Nuzumlalı MY, Rosand B, Li YX, Zhang M, Chang D, Taylor RA, Krumholz HM, Radev D. Neural natural language processing for unstructured data in electronic health records: A review. Computer Science Review, 2022, 46: 100511. [doi: 10.1016/j.cosrev.2022.100511]

[4] Li M, Xiang L, Kang XM, Zhao Y, Zhou Y, Zong CQ. Medical term and status generation from Chinese clinical dialogue with multi-granularity transformer. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3362–3374. [doi: 10.1109/TASLP.2021.3122301]

[5] Sun J, Zhou Y, Zong CQ. One-shot relation learning for knowledge graphs via neighborhood aggregation and paths encoding. ACM Transactions on Asian and Low-Resource Language Information Processing, 2021, 21(3): 52. [doi: 10.1145/3484729]

[6] 周永惠. 关于现代汉语语素. 西南民族学院学报·哲学社会科学版, 2001, 22(7): 202–205.

Zhou YH. About modern Chinese morphemes. Journal of Southwest University for Nationalities (Philosophy and Social Sciences), 2001, 22(7): 202–205 (in Chinese with English abstract).

[7] Chiu JPC, Nichols E. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 2016, 4: 357–370. [doi: 10.1162/tacl_a_00104]

[8] Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition. In: Proc. of the 2016 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego: ACL, 2016. 260–270.

[9] Dong CH, Zhang JJ, Zong CQ, Hattori M, Di H. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In: Proc. of th 5th CCF Conf. on Natural Language Processing and Chinese Computing (NLPCC 2016), and the 24th Int’l Conf. on Computer Processing of Oriental Languages. Kunming: Springer, 2016. 239–250.

[10] Huang ZH, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991, 2015.

[11] Sheikhshab G, Birol I, Sarkar A. In-domain context-aware token embeddings improve biomedical named entity recognition. In: Proc. of the 9th Int’l Workshop on Health Text Mining and Information Analysis. Brussels: ACL, 2018. 160–164.

[12] Li XN, Yan H, Qiu XP, Huang XJ. FLAT: Chinese ner using flat-lattice transformer. In: Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. Online: ACL, 2020. 6836–6842.

[13] Bekoulis G, Deleu J, Demeester T, Develder C. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Systems with Applications, 2018, 114: 34–45. [doi: 10.1016/j.eswa.2018.07.032]

[14] Yu JT, Bohnet B, Poesio M. Named entity recognition as dependency parsing. In: Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2020. 6470–6476.

[15] Shen YL, Ma XY, Tan ZQ, Zhang S, Wang W, Lu WM. Locate and label: A two-stage identifier for nested named entity recognition. In: Proc. of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int’l Joint Conf. on Natural Language Processing (Vol. 1: Long Papers). ACL, 2021. 2782–2794.

[16] Isozaki H, Kazawa H. Efficient support vector classifiers for named entity recognition. In: Proc. of the 19th Int’l Conf. on Computational Linguistics. Taipei: ACL, 2002. 1–7.

[17] Lee KJ, Hwang YS, Kim S, Rim HC. Biomedical named entity recognition using two-phase model based on SVMs. Journal of Biomedical Informatics, 2004, 37(6): 436–447. [doi: 10.1016/j.jbi.2004.08.012]

[18] Ju ZF, Wang J, Zhu F. Named entity recognition from biomedical text using SVM. In: Proc. of the 5th Int’l Conf. on Bioinformatics and Biomedical Engineering. Wuhan: IEEE, 2011. 1–4.

[19] Zhou GD, Su J. Named entity recognition using an HMM-based chunk tagger. In: Proc. of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia: ACL, 2002. 473–480.

[20] Zhao SJ. Named entity recognition in biomedical texts using an HMM model. In: Proc. of the 2004 Int’l Joint Workshop on Natural Language Processing in Biomedicine and its Applications. Geneva: COLING, 2004. 87–90.

[21] Zhang J, Shen D, Zhou GD, Su J, Tan CL. Enhancing HMM-based biomedical named entity recognition by studying special phenomena. Journal of Biomedical Informatics, 2004, 37(6): 411–422. [doi: 10.1016/j.jbi.2004.08.005]

[22] McCallum A, Li W. Early results for named entity recognition with conditional random fields, feature induction and Web-enhanced lexicons. In: Proc. of the 7th Conf. on Natural Language Learning at HLT-NAACL 2003. Edmonton: ACL, 2003. 188–191.

[23] Settles B. Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proc. of the 2004 Int’l Joint Workshop on Natural Language Processing in Biomedicine and its Applications. Geneva: COLING, 2004. 107–110.

[24] Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2011, 12: 2493–2537.

[25] Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep contextualized word representations. In: Proc. of the 2018 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long Papers). New Orleans: ACL, 2018. 2227–2237.

[26] Hakala K, Pyysalo S. Biomedical named entity recognition with multilingual BERT. In: Proc. of the 5th Workshop on BioNLP Open Shared Tasks. Hong Kong: ACL, 2019. 56–61.

[27] Shen D, Zhang J, Zhou GD, Su J, Tan CL. Effective adaptation of hidden markov model-based named entity recognizer for biomedical domain. In: Proc. of the 2003 ACL Workshop on Natural Language Processing in Biomedicine. Sapporo: ACL, 2003. 49–56.

[28] Zhou GD, Zhang J, Su J, Shen D, Tan C. Recognizing names in biomedical texts: A machine learning approach. Bioinformatics, 2004, 20(7): 1178–1190. [doi: 10.1093/bioinformatics/bth060]

[29] Zhou GD. Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid. International Journal of Medical Informatics, 2006, 75(6): 456–467. [doi: 10.1016/j.ijmedinf.2005.06.012]

[30] Ju MZ, Miwa M, Ananiadou S. A neural layered model for nested named entity recognition. In: Proc. of the 2018 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long Papers). New Orleans: ACL, 2018. 1446–1459.

[31] Wang J, Shou LD, Chen K, Chen G. Pyramid: A layered model for nested named entity recognition. In: Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2020. 5918–5928.

[32] Zheng CM, Cai Y, Xu JY, Leung HF, Xu GD. A boundary-aware neural model for nested named entity recognition. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing. Hong Kong: ACL, 2019. 357–366.

[33] Su JL, Murtadha A, Pan SF, Hou J, Sun J, Huang WW, Wen B, Liu YF. Global pointer: Novel efficient span-based approach for named entity recognition. arXiv:2208.03054, 2022.

[34] Zhang NY, Jia QH, Yin KP, Dong L, Gao F, Hua NW. Conceptualized representation learning for Chinese biomedical text mining. arXiv:2008.10813, 2020.

引用本文

闫璟辉,宗成庆,徐金安.中文医疗文本中的嵌套实体识别方法.软件学报,2024,35(6):2923-2935

复制

文章指标

点击次数:676
下载次数: 2241
HTML阅读次数: 1177
引用次数: 0

历史

收稿日期:2022-09-30
最后修改日期:2022-11-03
录用日期:
在线发布日期: 2023-08-23
出版日期: 2024-06-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码