基于释义知识浮动注入的汉语成语误用诊断
作者:
中图分类号:

TP18

基金项目:

国家自然科学基金(62206126, 62376120)


Chinese Idiom Misuse Diagnosis Based on Levitating Injection of Interpretation Knowledge
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • | |
  • 文章评论
    摘要:

    汉语成语作为汉语写作的重要组成部分, 具有凝练的表现力和深厚的文化内涵. 它们通常是经过长期使用而固定下来的词组或短句, 来源广泛, 含义相对固定. 然而, 由于汉字的形意属性和汉语词汇、语义的古今变迁, 成语的字面意思与实际含义往往存在偏差, 呈现出特有的非组合性特点, 这种特点使得成语在使用过程中极易产生误用现象, 研究显示, 某些成语的误用率甚至高达 98.6%. 与其他语言不同, 汉语成语的误用通常不会导致词法或语法错误, 因此传统的拼写或语法错误检测方法无法有效识别成语误用. 一种直观的方法是将成语的释义融入模型中, 但是简单的拼接释义会导致句子过长难以处理和知识噪声等问题. 为了解决这一问题, 提出一种基于释义知识浮动注入的模型. 该模型通过引入可学习的权重因子来控制知识注入, 并探讨有效的释义知识注入策略. 为了验证模型的有效性, 构建一套针对汉语成语误用诊断的数据集. 实验结果显示, 该模型在所有测试集上均取得了最优效果, 特别是在长文本多成语的复杂场景中, 性能比基线模型提高了 12.4%–13.9%, 同时训练速度提升了 30%–40%, 测试速度提升了 90%. 这证明了所提出的释义知识浮动注入模型不仅有效融合了成语释义特征, 还显著降低了成语释义拼接对模型处理能力和效率的负面影响, 从而提升了成语误用诊断的性能, 并增强了模型处理多成语和长释义等复杂场景的能力.

    Abstract:

    Chinese idioms, as an essential part of Chinese writing, possess concise expressiveness and profound cultural significance. They are typically phrases or short sentences that have become fixed through long-term use, with diverse origins and relatively stable meanings. However, due to the pictographic nature of Chinese characters and the historical evolution of Chinese vocabulary and semantics, there is often a discrepancy between the literal and actual meanings of idioms, which exhibits a unique non-compositional characteristic. This feature makes idioms prone to misuse of idioms in practice, with research showing that certain idioms are misused at a rate as high as 98.6%. Unlike in other languages, the misuse of Chinese idioms does not typically result in lexical or grammatical errors, which makes traditional spelling and grammar error detection methods ineffective at identifying idiom misuse. An intuitive approach is to incorporate the interpretations of idioms into the model, but simply combining these interpretations can lead to problems such as excessively long sentences that are hard to process and noise in knowledge. To address this, this study proposes a novel model that uses levitating knowledge injection to incorporate idiom interpretations. This model introduces learnable weight factors to control the injection process and explores effective strategies for knowledge infusion. To validate the model’s effectiveness, a dataset specifically for diagnosing the misuse of Chinese idioms is created. Experimental results show that the model achieves optimal performance across all test sets, particularly in complex scenarios involving long texts and multiple idioms, where its performance improves by 12.4%–13.9% compared to the baseline model. At the same time, training speed increases by 30%–40%, and testing speed is improved by 90%. These results demonstrate that the proposed model not only effectively integrates the interpretative features of idioms but also significantly reduces the negative impact of interpretation concatenation on the model’s processing capacity and efficiency, thus enhancing the performance of Chinese idiom misuse diagnosis and strengthening the model’s ability to handle complex scenarios with multiple idioms and lengthy interpretations.

    参考文献
    相似文献
    引证文献
引用本文

何亮,曹永昌,黄琰琛,吴震,戴新宇,陈家骏.基于释义知识浮动注入的汉语成语误用诊断.软件学报,,():1-14

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2024-08-21
  • 最后修改日期:2024-11-08
  • 在线发布日期: 2025-04-30
文章二维码
您是第19985590位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号