基于释义知识浮动注入的汉语成语误用诊断

doi:10.13328/j.cnki.jos.007373

微信服务号

微信订阅号

2025年8月13日 15:51 星期三

首页 > 过刊浏览>年第卷第期 >1-14. DOI:10.13328/j.cnki.jos.007373

PDF HTML阅读 XML下载导出引用引用提醒

基于释义知识浮动注入的汉语成语误用诊断
DOI:
                        10.13328/j.cnki.jos.007373
                    
CSTR:
                        
                    
作者:
                        何亮何亮
计算机软件新技术全国重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找
曹永昌曹永昌
计算机软件新技术全国重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找
黄琰琛黄琰琛
计算机软件新技术全国重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找
吴震吴震
计算机软件新技术全国重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找
戴新宇戴新宇
计算机软件新技术全国重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找
陈家骏陈家骏
计算机软件新技术全国重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP18
基金项目:国家自然科学基金(62206126, 62376120)

Chinese Idiom Misuse Diagnosis Based on Levitating Injection of Interpretation Knowledge

Author:

HE Liang
HE Liang
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找
CAO Yong-Chang
CAO Yong-Chang
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找
HUANG Yan-Chen
HUANG Yan-Chen
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找
WU Zhen
WU Zhen
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找
DAI Xin-Yu
DAI Xin-Yu
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Jia-Jun
CHEN Jia-Jun
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

汉语成语作为汉语写作的重要组成部分, 具有凝练的表现力和深厚的文化内涵. 它们通常是经过长期使用而固定下来的词组或短句, 来源广泛, 含义相对固定. 然而, 由于汉字的形意属性和汉语词汇、语义的古今变迁, 成语的字面意思与实际含义往往存在偏差, 呈现出特有的非组合性特点, 这种特点使得成语在使用过程中极易产生误用现象, 研究显示, 某些成语的误用率甚至高达 98.6%. 与其他语言不同, 汉语成语的误用通常不会导致词法或语法错误, 因此传统的拼写或语法错误检测方法无法有效识别成语误用. 一种直观的方法是将成语的释义融入模型中, 但是简单的拼接释义会导致句子过长难以处理和知识噪声等问题. 为了解决这一问题, 提出一种基于释义知识浮动注入的模型. 该模型通过引入可学习的权重因子来控制知识注入, 并探讨有效的释义知识注入策略. 为了验证模型的有效性, 构建一套针对汉语成语误用诊断的数据集. 实验结果显示, 该模型在所有测试集上均取得了最优效果, 特别是在长文本多成语的复杂场景中, 性能比基线模型提高了 12.4%–13.9%, 同时训练速度提升了 30%–40%, 测试速度提升了 90%. 这证明了所提出的释义知识浮动注入模型不仅有效融合了成语释义特征, 还显著降低了成语释义拼接对模型处理能力和效率的负面影响, 从而提升了成语误用诊断的性能, 并增强了模型处理多成语和长释义等复杂场景的能力.

关键词:汉语成语;误用诊断;释义知识;浮动注入;成语误用数据集

Abstract:

Chinese idioms, as an essential part of Chinese writing, possess concise expressiveness and profound cultural significance. They are typically phrases or short sentences that have become fixed through long-term use, with diverse origins and relatively stable meanings. However, due to the pictographic nature of Chinese characters and the historical evolution of Chinese vocabulary and semantics, there is often a discrepancy between the literal and actual meanings of idioms, which exhibits a unique non-compositional characteristic. This feature makes idioms prone to misuse of idioms in practice, with research showing that certain idioms are misused at a rate as high as 98.6%. Unlike in other languages, the misuse of Chinese idioms does not typically result in lexical or grammatical errors, which makes traditional spelling and grammar error detection methods ineffective at identifying idiom misuse. An intuitive approach is to incorporate the interpretations of idioms into the model, but simply combining these interpretations can lead to problems such as excessively long sentences that are hard to process and noise in knowledge. To address this, this study proposes a novel model that uses levitating knowledge injection to incorporate idiom interpretations. This model introduces learnable weight factors to control the injection process and explores effective strategies for knowledge infusion. To validate the model’s effectiveness, a dataset specifically for diagnosing the misuse of Chinese idioms is created. Experimental results show that the model achieves optimal performance across all test sets, particularly in complex scenarios involving long texts and multiple idioms, where its performance improves by 12.4%–13.9% compared to the baseline model. At the same time, training speed increases by 30%–40%, and testing speed is improved by 90%. These results demonstrate that the proposed model not only effectively integrates the interpretative features of idioms but also significantly reduces the negative impact of interpretation concatenation on the model’s processing capacity and efficiency, thus enhancing the performance of Chinese idiom misuse diagnosis and strengthening the model’s ability to handle complex scenarios with multiple idioms and lengthy interpretations.

Key words:Chinese idiom;misuse diagnosis;interpretation knowledge;levitating injection;idiom misuse dataset

引用本文

何亮,曹永昌,黄琰琛,吴震,戴新宇,陈家骏.基于释义知识浮动注入的汉语成语误用诊断.软件学报,,():1-14

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2024-08-21
最后修改日期:2024-11-08
录用日期:
在线发布日期: 2025-04-30
出版日期:

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码