Short Text Classification Model Combining Knowledge Aware and Dual Attention
Author:
Affiliation:

Clc Number:

TP18

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    As the core problem of text mining, text classification task has become an essential issue in the field of natural language processing. Short text classification is a hot-spot topic, and one of many urgent problems to be solved in text classification due to its sparseness, real-time, and non-standard characteristics. In certain specific scenarios, short texts have many implicit semantics, which brings challenges to tasks such as mining implicit semantic features in limited texts. The existing research methods mainly apply traditional machine learning or deep learning algorithms for short text classification. However, this series of algorithm is complex and requires enormous cost to build an effective model, meanwhile, the algorithms are not efficient. In addition, short text contains less effective information and abundant colloquial language, which requires a stronger feature learning ability of the model. In response to the above problems, the KAeRCNN model is proposed based on the TextRCNN model, which combines knowledge aware and the dual attention mechanism. The knowledge-aware is constructed in two parts, which includes the stage of knowledge graph entity linking and knowledge graph embedding, as external knowledge can be introduced to obtain semantic features. At the same time, the dual attention mechanism can improve the model's efficiency in extracting effective information from short texts. Excessive experimental results show that the KAeRCNN model proposed in this study is significantly better than traditional machine learning algorithms in terms of classification accuracy, the F1 score, and practical application effects. The performance and adaptability of the algorithm are further verified with different datasets. The accuracy rate of the proposed approach reaches 95.54%, and the F1 score reaches 0.901. Compared with the four traditional machine learning algorithms, the accuracy rate is increased by about 14% on average, and the F1 score is increased by about 13%. Compared with TextRCNN, the KAeRCNN model improves accuracy by about 3%. In addition, the experimental results of comparison with deep learning algorithms also show that the proposed model has better performance in classification of short text from other fields. Both theoretical and experimental results indicate that the KAeRCNN model proposed in this study is effective for short text classification.

    Reference
    Related
    Cited by
Get Citation

李博涵,向宇轩,封顶,何志超,吴佳骏,戴天伦,李静.融合知识感知与双重注意力的短文本分类模型.软件学报,2022,33(10):3565-3581

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:July 20,2021
  • Revised:August 30,2021
  • Adopted:
  • Online: February 22,2022
  • Published: October 06,2022
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063