Idiom Cloze Algorithm Integrating with Pre-trained Language Model
Author:
Affiliation:

Clc Number:

TP18

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    One of the crucial tasks in the field of natural language processing (NLP) is identifying suitable idioms due to context. The available research considers the Chinese idiom cloze task as a textual similarity task. Although the current pre-trained language model plays an important role in textual similarity, it also has apparent defects. When pre-trained language model is used as a feature extractor, it ignores the mutual information between sentences; while as a text matcher, it requires high computational cost and long running time. In addition, the matching between context and candidate idioms is asymmetric, which influences the effect of the pre-trained language model as a text matcher. In order to solve the above two problems, this study is motivated by the idea of parameter sharing and proposes a TALBERT-blank network. Idiom selection is transformed from a context-based asymmetric matching process into a blank-based symmetric matching process by TALBERT-blank. The pre-trained language model acts as both a feature extractor and a text matcher, and the sentence vector is utilized for latent semantic matches. This greatly reduces the number of parameters and the consumption of memory, improves the speed of train and inference while maintaining accuracy, and produces a lightweight and efficient effect. The experimental results of this model on CHID data set prove that compared with ALBERT text matcher, the calculation time is further shortened by 54.35 percent for the compression model with a greater extent under the condition of maintaining accuracy.

    Reference
    Related
    Cited by
Get Citation

琚生根,黄方怡,孙界平.融合预训练语言模型的成语完形填空算法.软件学报,2022,33(10):3793-3805

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 26,2020
  • Revised:December 08,2020
  • Adopted:
  • Online: May 24,2022
  • Published: October 06,2022
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063