对于合同文本中要素和条款两类信息的准确提取, 可以有效提升合同的审查效率, 为贸易各方提供便利化服务. 然而当前的合同信息抽取方法一般训练单任务模型对要素和条款分别进行抽取, 并没有深挖合同文本的特征, 忽略了不同任务间的关联性. 因此, 采用深度神经网络结构对要素抽取和条款抽取两个任务间的相关性进行研究, 并提出多任务学习方法. 所提方法首先将上述两种任务进行融合, 构建一种应用于合同信息抽取的基本多任务学习模型; 然后对其进行优化, 利用Attention机制进一步挖掘其相关性, 形成基于Attention机制的动态多任务学习模型; 最后针对篇章级合同文本中复杂的语义环境, 在前两者的基础上提出一种融合词汇知识的动态多任务学习模型. 实验结果表明, 所提方法可以充分捕捉任务间的共享特征, 不仅取得了比单任务模型更好的信息抽取结果, 而且能够有效解决合同文本中要素与条款间实体嵌套的问题, 实现合同要素与条款的信息联合抽取. 此外, 为了验证该方法的鲁棒性, 在多个领域的公开数据集上进行实验, 结果表明该方法的效果均优于基线方法.
Accurately extracting two types of information including elements and clauses in contract texts can effectively improve the contract review efficiency and provide facilitation services for all trading parties. However, current contract information extraction methods generally train single-task models to extract elements and clauses separately, whereas they do not dig deep into the characteristics of contract texts, ignoring the relevance among different tasks. Therefore, this study employs a deep neural network structure to study the correlation between the two tasks of element extraction and clause extraction and proposes a multitask learning method. Firstly, the primary multitask learning model is built for contract information extraction by combining the above two tasks. Then, the model is optimized and attention mechanism is adopted to further explore the correlation. Additionally, an Attention-based dynamic multitask-learning model is built. Finally, based on the above two methods, adynamic multitask learning model with lexical knowledge is proposed for the complex semantic environment in contract texts. The experimental results show that the method can fully capture the shared features among tasks and yield better information extraction results than the single-task model. It can solve the nested entity among elements and clauses in contract texts, and realize the joint information extraction of contract elements and clauses. In addition, to verify the robustness of the proposed method, this study conducts experiments on public datasets in various fields, and the results show that the proposed method is superior to baseline methods.