Nested Entity Recognition Approach in Chinese Medical Text
Author:
Affiliation:

Clc Number:

TP18

  • Article
  • | |
  • Metrics
  • |
  • Reference [35]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Entity recognition is a key technology for information extraction. Compared with ordinary text, the entity recognition of Chinese medical text is often faced with a large number of nested entities. Previous methods of entity recognition often ignore the entity nesting rules unique to medical text and directly use sequence annotation methods. Therefore, a Chinese entity recognition method that incorporates entity nesting rules is proposed. This method transforms the entity recognition task into a joint training task of entity boundary recognition and boundary first-tail relationship recognition in the training process and filters the results by combining the entity nesting rules summarized from actual medical text in the decoding process. In this way, the recognition results are in line with the composition law of the nested combinations of inner and outer entities in the actual text. Good results have been achieved in public experiments on entity recognition of medical text. Experiments on the dataset show that the proposed method is significantly superior to the existing methods in terms of nested-type entity recognition performance, and the overall accuracy is increased by 0.5% compared with the state-of-the-art methods.

    Reference
    [1] Cowie MR, Blomster JI, Curtis LH, Duclaux S, Ford I, Fritz F, Goldman S, Janmohamed S, Kreuzer J, Leenay M, Michel A, Ong S, Pell JP, Southworth MR, Stough WG, Thoenes M, Zannad F, Zalewski A. Electronic health records to facilitate clinical research. Clinical Research in Cardiology, 2017, 106(1): 1–9. [doi: 10.1007/s00392-016-1025-6]
    [2] Denaxas SC, Morley KI. Big biomedical data and cardiovascular disease research: Opportunities and challenges. European Heart Journal-Quality of Care and Clinical Outcomes, 2015, 1(1): 9–16. [doi: 10.1093/ehjqcco/qcv005]
    [3] Li I, Pan J, Goldwasser J, Verma N, Wong WP, Nuzumlalı MY, Rosand B, Li YX, Zhang M, Chang D, Taylor RA, Krumholz HM, Radev D. Neural natural language processing for unstructured data in electronic health records: A review. Computer Science Review, 2022, 46: 100511. [doi: 10.1016/j.cosrev.2022.100511]
    [4] Li M, Xiang L, Kang XM, Zhao Y, Zhou Y, Zong CQ. Medical term and status generation from Chinese clinical dialogue with multi-granularity transformer. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 29: 3362–3374. [doi: 10.1109/TASLP.2021.3122301]
    [5] Sun J, Zhou Y, Zong CQ. One-shot relation learning for knowledge graphs via neighborhood aggregation and paths encoding. ACM Transactions on Asian and Low-Resource Language Information Processing, 2021, 21(3): 52. [doi: 10.1145/3484729]
    [6] 周永惠. 关于现代汉语语素. 西南民族学院学报·哲学社会科学版, 2001, 22(7): 202–205.
    Zhou YH. About modern Chinese morphemes. Journal of Southwest University for Nationalities (Philosophy and Social Sciences), 2001, 22(7): 202–205 (in Chinese with English abstract).
    [7] Chiu JPC, Nichols E. Named entity recognition with bidirectional LSTM-CNNs. Transactions of the Association for Computational Linguistics, 2016, 4: 357–370. [doi: 10.1162/tacl_a_00104]
    [8] Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition. In: Proc. of the 2016 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego: ACL, 2016. 260–270.
    [9] Dong CH, Zhang JJ, Zong CQ, Hattori M, Di H. Character-based LSTM-CRF with radical-level features for Chinese named entity recognition. In: Proc. of th 5th CCF Conf. on Natural Language Processing and Chinese Computing (NLPCC 2016), and the 24th Int’l Conf. on Computer Processing of Oriental Languages. Kunming: Springer, 2016. 239–250.
    [10] Huang ZH, Xu W, Yu K. Bidirectional LSTM-CRF models for sequence tagging. arXiv:1508.01991, 2015.
    [11] Sheikhshab G, Birol I, Sarkar A. In-domain context-aware token embeddings improve biomedical named entity recognition. In: Proc. of the 9th Int’l Workshop on Health Text Mining and Information Analysis. Brussels: ACL, 2018. 160–164.
    [12] Li XN, Yan H, Qiu XP, Huang XJ. FLAT: Chinese ner using flat-lattice transformer. In: Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. Online: ACL, 2020. 6836–6842.
    [13] Bekoulis G, Deleu J, Demeester T, Develder C. Joint entity recognition and relation extraction as a multi-head selection problem. Expert Systems with Applications, 2018, 114: 34–45. [doi: 10.1016/j.eswa.2018.07.032]
    [14] Yu JT, Bohnet B, Poesio M. Named entity recognition as dependency parsing. In: Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2020. 6470–6476.
    [15] Shen YL, Ma XY, Tan ZQ, Zhang S, Wang W, Lu WM. Locate and label: A two-stage identifier for nested named entity recognition. In: Proc. of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int’l Joint Conf. on Natural Language Processing (Vol. 1: Long Papers). ACL, 2021. 2782–2794.
    [16] Isozaki H, Kazawa H. Efficient support vector classifiers for named entity recognition. In: Proc. of the 19th Int’l Conf. on Computational Linguistics. Taipei: ACL, 2002. 1–7.
    [17] Lee KJ, Hwang YS, Kim S, Rim HC. Biomedical named entity recognition using two-phase model based on SVMs. Journal of Biomedical Informatics, 2004, 37(6): 436–447. [doi: 10.1016/j.jbi.2004.08.012]
    [18] Ju ZF, Wang J, Zhu F. Named entity recognition from biomedical text using SVM. In: Proc. of the 5th Int’l Conf. on Bioinformatics and Biomedical Engineering. Wuhan: IEEE, 2011. 1–4.
    [19] Zhou GD, Su J. Named entity recognition using an HMM-based chunk tagger. In: Proc. of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia: ACL, 2002. 473–480.
    [20] Zhao SJ. Named entity recognition in biomedical texts using an HMM model. In: Proc. of the 2004 Int’l Joint Workshop on Natural Language Processing in Biomedicine and its Applications. Geneva: COLING, 2004. 87–90.
    [21] Zhang J, Shen D, Zhou GD, Su J, Tan CL. Enhancing HMM-based biomedical named entity recognition by studying special phenomena. Journal of Biomedical Informatics, 2004, 37(6): 411–422. [doi: 10.1016/j.jbi.2004.08.005]
    [22] McCallum A, Li W. Early results for named entity recognition with conditional random fields, feature induction and Web-enhanced lexicons. In: Proc. of the 7th Conf. on Natural Language Learning at HLT-NAACL 2003. Edmonton: ACL, 2003. 188–191.
    [23] Settles B. Biomedical named entity recognition using conditional random fields and rich feature sets. In: Proc. of the 2004 Int’l Joint Workshop on Natural Language Processing in Biomedicine and its Applications. Geneva: COLING, 2004. 107–110.
    [24] Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2011, 12: 2493–2537.
    [25] Peters ME, Neumann M, Iyyer M, Gardner M, Clark C, Lee K, Zettlemoyer L. Deep contextualized word representations. In: Proc. of the 2018 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long Papers). New Orleans: ACL, 2018. 2227–2237.
    [26] Hakala K, Pyysalo S. Biomedical named entity recognition with multilingual BERT. In: Proc. of the 5th Workshop on BioNLP Open Shared Tasks. Hong Kong: ACL, 2019. 56–61.
    [27] Shen D, Zhang J, Zhou GD, Su J, Tan CL. Effective adaptation of hidden markov model-based named entity recognizer for biomedical domain. In: Proc. of the 2003 ACL Workshop on Natural Language Processing in Biomedicine. Sapporo: ACL, 2003. 49–56.
    [28] Zhou GD, Zhang J, Su J, Shen D, Tan C. Recognizing names in biomedical texts: A machine learning approach. Bioinformatics, 2004, 20(7): 1178–1190. [doi: 10.1093/bioinformatics/bth060]
    [29] Zhou GD. Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid. International Journal of Medical Informatics, 2006, 75(6): 456–467. [doi: 10.1016/j.ijmedinf.2005.06.012]
    [30] Ju MZ, Miwa M, Ananiadou S. A neural layered model for nested named entity recognition. In: Proc. of the 2018 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long Papers). New Orleans: ACL, 2018. 1446–1459.
    [31] Wang J, Shou LD, Chen K, Chen G. Pyramid: A layered model for nested named entity recognition. In: Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2020. 5918–5928.
    [32] Zheng CM, Cai Y, Xu JY, Leung HF, Xu GD. A boundary-aware neural model for nested named entity recognition. In: Proc. of the 2019 Conf. on Empirical Methods in Natural Language Processing and the 9th Int’l Joint Conf. on Natural Language Processing. Hong Kong: ACL, 2019. 357–366.
    [33] Su JL, Murtadha A, Pan SF, Hou J, Sun J, Huang WW, Wen B, Liu YF. Global pointer: Novel efficient span-based approach for named entity recognition. arXiv:2208.03054, 2022.
    [34] Zhang NY, Jia QH, Yin KP, Dong L, Gao F, Hua NW. Conceptualized representation learning for Chinese biomedical text mining. arXiv:2008.10813, 2020.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

闫璟辉,宗成庆,徐金安.中文医疗文本中的嵌套实体识别方法.软件学报,2024,35(6):2923-2935

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 30,2022
  • Revised:November 03,2022
  • Online: August 23,2023
  • Published: June 06,2024
You are the first2033332Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063