Dynamic Multitask Learning Approach for Contract Information Extraction
Author:
Affiliation:

Clc Number:

TP18

  • Article
  • | |
  • Metrics
  • |
  • Reference [49]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Accurately extracting two types of information including elements and clauses in contract texts can effectively improve the contract review efficiency and provide facilitation services for all trading parties. However, current contract information extraction methods generally train single-task models to extract elements and clauses separately, whereas they do not dig deep into the characteristics of contract texts, ignoring the relevance among different tasks. Therefore, this study employs a deep neural network structure to study the correlation between the two tasks of element extraction and clause extraction and proposes a multitask learning method. Firstly, the primary multitask learning model is built for contract information extraction by combining the above two tasks. Then, the model is optimized and attention mechanism is adopted to further explore the correlation. Additionally, an Attention-based dynamic multitask-learning model is built. Finally, based on the above two methods, adynamic multitask learning model with lexical knowledge is proposed for the complex semantic environment in contract texts. The experimental results show that the method can fully capture the shared features among tasks and yield better information extraction results than the single-task model. It can solve the nested entity among elements and clauses in contract texts, and realize the joint information extraction of contract elements and clauses. In addition, to verify the robustness of the proposed method, this study conducts experiments on public datasets in various fields, and the results show that the proposed method is superior to baseline methods.

    Reference
    [1] Peng N, Dredze M. Improving named entity recognition for Chinese social media with word segmentation representation learning. In: Proc. of the 54th Annual Meeting of the Association for Computational Linguistics (Vol. 2: Short Papers). Berlin: ACL, 2016. 149–155.
    [2] Zhang Y, Yang J. Chinese NER using Lattice-LSTM. In: Proc. of the 56th Annual Meeting of the Association for Computational Linguistics (Vol. 1: Long Papers). Melbourne: ACL, 2018. 1554–1564.
    [3] Tang HY, Liu JN, Zhao M, Gong XD. Progressive layered extraction (PLE): A novel multi-task learning (MTL) model for personalized recommendations. In: Proc. of the 14th ACM Conf. on Recommender Systems. Virtual Event: ACM, 2020. 269–278.
    [4] Majumder N, Poria S, Peng HY, Chhaya N, Cambria E, Gelbukh A. Sentiment and sarcasm classification with multitask learning. IEEE Intelligent Systems, 2019, 34(3): 38–43. [doi: 10.1109/MIS.2019.2904691]
    [5] 邓依依, 邬昌兴, 魏永丰, 万仲保, 黄兆华. 基于深度学习的命名实体识别综述. 中文信息学报, 2021, 35(9): 30–45. [doi: 10.3969/j.issn.1003-0077.2021.09.003]
    Deng YY, Wu CX, Wei YF, Wan ZB, Huang ZH. A survey on named entity recognition based on deep learning. Journal of Chinese Information Processing, 2021, 35(9): 30–45 (in Chinese with English abstract). [doi: 10.3969/j.issn.1003-0077.2021.09.003]
    [6] 罗凌, 杨志豪, 宋雅文, 李楠, 林鸿飞. 基于笔画ELMo和多任务学习的中文电子病历命名实体识别研究. 计算机学报, 2020, 43(10): 1943–1957. [doi: 10.11897/SP.J.1016.2020.01943]
    Luo L, Yang ZH, Song YW, Li N, Lin HF. Chinese clinical named entity recognition based on stroke ELMo and multi-task learning. Chinese Journal of Computers, 2020, 43(10): 1943–1957 (in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2020.01943]
    [7] Chen YB, Xu LH, Liu K, Zeng DJ, Zhao J. Event extraction via dynamic multi-pooling convolutional neural networks. In: Proc. of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th Int’l Joint Conf. on Natural Language Processing (Vol. 1: Long Papers). Beijing: ACL, 2015. 167–176.
    [8] 洪文兴, 胡志强, 翁洋, 张恒, 王竹, 郭志新. 面向司法案件的案情知识图谱自动构建. 中文信息学报, 2020, 34(1): 34–44. [doi: 10.3969/j.issn.1003-0077.2020.01.005]
    Hong WX, Hu ZQ, Weng Y, Zhang H, Wang Z, Guo ZX. Automated knowledge graph construction for judicial case facts. Journal of Chinese Information Processing, 2020, 34(1): 34–44 (in Chinese with English abstract).
    [9] Diefenbach D, Lopez V, Singh K, Maret P. Core techniques of question answering systems over knowledge bases: A survey. Knowledge and Information Systems, 2018, 55(3): 529–569. [doi: 10.1007/s10115-017-1100-y]
    [10] Ma RT, Peng ML, Zhang Q, Wei ZY, Huang XJ. Simplify the usage of lexicon in Chinese NER. In: Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2020. 5951–5960.
    [11] Wu S, Song XN, Feng ZH. MECT: Multi-metadata embedding based cross-transformer for Chinese named entity recognition. In: Proc. of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th Int’l Joint Conf. on Natural Language Processing (Vol. 1: Long Papers). ACL, 2021. 1529–1539.
    [12] Li XN, Yan H, Qiu XP, Huang XJ. FLAT: Chinese NER using flat-lattice Transformer. In: Proc. of the 58th Annual Meeting of the Association for Computational Linguistics. ACL, 2020. 6836–6842.
    [13] Chalkidis I, Fergadiotis M, Malakasiotis P, Androutsopoulos I. Neural contract element extraction revisited. In: Proc. of the 33rd Conf. on Neural Information Processing Systems. Vancouver, 2019. 7413–7424.
    [14] Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proc. of the 26th Int’l Conf. on neural Information Processing Systems. Lake Tahoe: Curran Associates Inc., 2013. 3111–3119.
    [15] Pennington J, Socher R, Manning C. GloVe: Global vectors for word representation. In: Proc. of the 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP). DohaL ACL, 2014. 1532–1543.
    [16] Chalkidis I, Fergadiotis M, Malakasiotis P, Androutsopoulos I. Neural contract element extraction revisited: Letters from sesame street. arXiv:2101.04355, 2021.
    [17] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. Attention is all you need. In: Proc. of the 31st Int’l Conf. on Neural Information Processing Systems. Long Beach: Curran Associates Inc., 2017. 6000–6010.
    [18] Curtotti M, McCreath E. Corpus based classification of text in Australian contracts. In: Proc. of the 2010 Australasian Language Technology Association Workshop. Melbourne, 2010. 18–26.
    [19] Indukuri KV, Krishna PR. Mining e-contract documents to classify clauses. In: Proc. of the 3rd Annual ACM Bangalore Conf. Bangalore: ACM, 2010. 7.
    [20] Chalkidis I, Androutsopoulos I, Michos A. Extracting contract elements. In: Proc. of the 16th Edition of the Int’l Conf. on Articial Intelligence and Law. London: ACM, 2017. 19–28.
    [21] Chalkidis I, Androutsopoulis I. A deep learning approach to contract element extraction. In: Wyner AZ, Casini G, eds. Frontiers in Artificial Intelligence and Applications: Vol. 302, Legal Knowledge and Information Systems. IOS Press, 2017. 155–164. [doi: 10.3233/978-1-61499-838-9-155]
    [22] Sun L, Zhang K, Ji FL, Yang ZH. Toi-CNN: A solution of information extraction on Chinese insurance policy. In: Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 2 (Industry Papers). Minneapolis: ACL, 2019. 174–181.
    [23] Wang ZH, Song HY, Ren ZC, Ren PJ, Chen ZM, Liu XZ, Li HS, de Rijke M. Cross-domain contract element extraction with a bi-directional feedback clause-element relation network. In: Proc. of the 44th Int’l ACM SIGIR Conf. on Research and Development in Information Retrieval. Virtual Event: ACM, 2021. 1003–1012.
    [24] Caruana R. Multitask learning. Machine Learning, 1997, 28(1): 41–75. [doi: 10.1023/A:1007379606734]
    [25] Vandenhende S, Georgoulis S, Van Gansbeke W, Proesmans M, Dai DX, Van Gool L. Multi-Task learning for dense prediction tasks: A survey. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2022, 44(7): 3614–3633. [doi: 10.1109/TPAMI.2021.3054719]
    [26] 褚真, 米庆, 马伟, 徐士彪, 张晓鹏. 部位级遮挡感知的人体姿态估计. 计算机研究与发展, 2022, 59(12): 2760–2769. [doi: 10.7544/issn1000-1239.20210723]
    Chu Z, Mi Q, Ma W, Xu SB, Zhang XP. Keypoint-Level occlusion-aware human pose estimation. Journal of Computer Research and Development, 2022, 59(12): 2760–2769 (in Chinese with English abstract). [doi: 10.7544/issn1000-1239.20210723]
    [27] Liu SK, Johns E, Davison AJ. End-to-end multi-task learning with attention. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 1871–1880.
    [28] Singh A, Saha S, Hasanuzzaman M, Dey K. Multitask learning for complaint identification and sentiment analysis. Cognitive Computation, 2022, 14(1): 212–227. [doi: 10.1007/s12559-021-09844-7]
    [29] El-Allaly ED, Sarrouti M, En-Nahnahi N, El Alaoui SO. MTTLADE: A multi-task transfer learning-based method for adverse drug events extraction. Information Processing & Management, 2021, 58(3): 102473. [doi: 10.1016/J.IPM.2020.102473]
    [30] Wang DS, Fan HJ, Liu JF. Learning with joint cross-document information via multi-task learning for named entity recognition. Information Sciences, 2021, 579: 454–467. [doi: 10.1016/j.ins.2021.08.015]
    [31] Tong YQ, Chen YD, Shi XD. A multi-task approach for improving biomedical named entity recognition by incorporating multi-granularity information. In: Proc. of the 2021 Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. ACL, 2021. 4804–4813.
    [32] 王卓越, 陈彦光, 邢铁军, 孙媛媛, 杨亮, 林鸿飞. 基于多任务学习的多罪名案件信息联合抽取. 计算机工程与应用, 2023, 59(2): 178–184. [doi: 10.3778/j.issn.1002-8331.2108-0344]
    Wang ZY, Chen YG, Xing TJ, Sun YY, Yang L, Lin HF. Joint entity and relation extraction for multi-crime legal documents with multi-task learning. Computer Engineering and Applications, 2023, 59(2): 178–184 (in Chinese with English abstract). [doi: 10.3778/j.issn.1002-8331.2108-0344]
    [33] 李青青, 杨志豪, 罗凌, 林鸿飞, 王健. 基于多任务学习的生物医学实体关系抽取. 中文信息学报, 2019, 33(8): 84–92. [doi: 10.3969/j.issn.1003-0077.2019.08.012]
    Li QQ, Yang ZH, Luo L, Lin HF, Wang J. A multi-task learning approach to biomedical entity relation extraction. Journal of Chinese Information Processing, 2019, 33(8): 84–92 (in Chinese with English abstract). [doi: 10.3969/j.issn.1003-0077.2019.08.012]
    [34] 葛海柱, 孔芳. 基于多任务学习的汉语基本篇章单元和主述位联合识别. 中文信息学报, 2020, 34(1): 71–79. [doi: 10.3969/j.issn.1003-0077.2020.01.010]
    Ge HZ, Kong F. Chinese elementary discourse unit and theme-rheme joint detection based on multi-task learning. Journal of Chinese Information Processing, 2020, 34(1): 71–79 (in Chinese with English abstract). [doi: 10.3969/j.issn.1003-0077.2020.01.010]
    [35] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473, 2016.
    [36] Sukhbaatar S, Szlam A, Weston J, Fergus R. End-to-end memory networks. In: Proc. of the 28th Int’l Conf. on Neural Information Processing Systems, Vol. 2. Montreal: MIT Press, 2015. 2440–2448.
    [37] Zhang Y, Yang Q. A survey on multi-task learning. IEEE Transactions on Knowledge & Data Engineering, 2022, 34(12): 5586–5609. [doi: 10.1109/TKDE.2021.3070203]
    [38] Lafferty J, McCallum A, Pereira FCN. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proc. of the 18th Int’l Conf. on Machine Learning. San Francisco: Morgan Kaufmann Publishers Inc., 2001. 282–289.
    [39] 蔡莉, 王淑婷, 刘俊晖, 朱扬勇. 数据标注研究综述. 软件学报, 2020, 31(2): 302–320. http://www.jos.org.cn/1000-9825/5977.htm
    Cai L, Wang ST, Liu JH, Zhu YY. Survey of data annotation. Ruan Jian Xue Bao/Journal of Software, 2020, 31(2): 302–320 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5977.htm
    [40] Peng NY, Dredze M. Named entity recognition for Chinese social media with jointly trained embeddings. In: Proc. of the 2015 Conf. on Empirical Methods in Natural Language Processing. Lisbon: ACL, 2015. 548–554.
    [41] Yang J, Zhang Y. NCRF++: An open-source neural sequence labeling toolkit. In: Proc. of the 2018 ACL System Demonstrations. Melbourne: ACL, 2018. 74–79.
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

王浩畅,郑冠彧,赵铁军.面向合同信息抽取的动态多任务学习方法.软件学报,2024,35(7):3377-3391

Copy
Share
Article Metrics
  • Abstract:568
  • PDF: 1739
  • HTML: 749
  • Cited by: 0
History
  • Received:June 15,2022
  • Revised:November 03,2022
  • Online: August 23,2023
  • Published: July 06,2024
You are the first2032468Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063