面向神经机器翻译系统的多粒度蜕变测试

doi:10.13328/j.cnki.jos.006221

微信服务号

微信订阅号

2025年4月4日 16:13 星期五

首页 > 过刊浏览>2021年第32卷第4期 >1051-1066. DOI:10.13328/j.cnki.jos.006221

PDF HTML阅读 XML下载导出引用引用提醒

面向神经机器翻译系统的多粒度蜕变测试
DOI:
                        10.13328/j.cnki.jos.006221
                    
CSTR:
                        
                    
作者:
                        钟文康钟文康
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找
葛季栋葛季栋
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找
陈翔陈翔
南通大学 信息科学技术学院, 江苏 南通 226019
在期刊界中查找
在百度中查找
在本站中查找
李传艺李传艺
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找
唐泽唐泽
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找
骆斌骆斌
计算机软件新技术国家重点实验室(南京大学), 江苏 南京 210023
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:钟文康(1997-),男,学士,主要研究领域为软件工程,自然语言处理.
葛季栋(1978-),男,博士,副教授,CCF高级会员,主要研究领域为软件工程,分布式计算与边缘计算,业务过程管理,自然语言处理.
陈翔(1980-),男,博士,副教授,CCF高级会员,主要研究领域为软件缺陷预测,软件缺陷定位,回归测试,组合测试.
李传艺(1991-),男,博士,助理研究员,CCF专业会员,主要研究领域为软件工程,业务过程管理,自然语言处理.
唐泽(1994-),男,硕士,主要研究领域为代码摘要,API补全.
骆斌(1967-),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为软件工程,人工智能.
通讯作者:李传艺,E-mail:lcy@nju.edu.cn
中图分类号:TP311
基金项目:国家自然科学基金（61802167，61972197，61802095）；江苏省自然科学基金（BK20201250）

Multi-granularity Metamorphic Testing for Neural Machine Translation System

Author:

ZHONG Wen-Kang
ZHONG Wen-Kang
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找
GE Ji-Dong
GE Ji-Dong
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Xiang
CHEN Xiang
School of Information Science and Technology, Nantong University, Nantong 226019, China
在期刊界中查找
在百度中查找
在本站中查找
LI Chuan-Yi
LI Chuan-Yi
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找
TANG Ze
TANG Ze
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找
LUO Bin
LUO Bin
State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210023, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

National Natural Science Foundation of China (61802167, 61972197, 61802095); Natural Science Foundation of Jiangsu Province of China (BK20201250)

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

机器翻译是利用计算机将一种自然语言转换成另一种自然语言的任务，是人工智能领域研究的热点问题之一.近年来，随着深度学习的发展，基于序列到序列结构的神经机器翻译模型在多种语言对的翻译任务上都取得了超过统计机器翻译模型的效果，并被广泛应用于商用翻译系统中.虽然商用翻译系统的实际应用效果直观表明了神经机器翻译模型性能有很大的提升，但如何系统地评估其翻译质量仍是一项具有挑战性的工作.一方面，若基于参考译文评估翻译效果，其高质量参考译文的获取成本非常高；另一方面，与统计机器翻译模型相比，神经机器翻译模型存在更显著的鲁棒性问题，然而还没有探讨神经机器翻译模型鲁棒性的相关研究.面对上述挑战，提出了一种基于蜕变测试的多粒度测试框架，用于在没有参考译文的情况下评估神经机器翻译系统的翻译质量及其翻译鲁棒性.该测试框架首先在句子粒度、短语粒度和单词粒度上分别对源语句进行替换，然后将源语句和替换后语句的翻译结果进行基于编辑距离和成分结构分析树的相似度计算，最后根据相似度判断翻译结果是否满足蜕变关系.分别在教育、微博、新闻、口语和字幕这5个领域的中英文数据集上对6个主流商用神经机器翻译系统使用不同的蜕变测试框架进行了对比实验.实验结果表明，所提方法在与基于参考译文方法的皮尔逊相关系数和斯皮尔曼相关系数上分别比同类型方法高80%和20%，说明提出的无参考译文的测试评估方法与基于参考译文的评估方法的正相关性更高，验证了其在评估准确性上显著优于同类型其他方法.

关键词:神经网络;机器翻译;质量评估;蜕变测试;多粒度

Abstract:

Machine translation task focuses on converting one natural language into another. In recent years, neural machine translation models based on sequence-to-sequence models have achieved better performance than traditional statistical machine translation models on multiple language pairs, and have been used by many translation service providers. Although the practical application of commercial translation system shows that the neural machine translation model has great improvement, how to systematically evaluate its translation quality is still a challenging task. On the one hand, if the translation effect is evaluated based on the reference text, the acquisition cost of high-quality reference text is very high. On the other hand, compared with the statistical machine translation model, the neural machine translation model has more significant robustness problems. However, there are no relevant studies on the robustness of the neural machine translation model. This study proposes a multi-granularity test framework MGMT based on metamorphic testing, which can evaluate the robustness of neural machine translation systems without reference translations. The testing framework first replaces the source sentence on sentence-granularity, phrase-granularity, and word-granularity respectively, then compares the translation results of the source sentence and the replaced sentences based on the constituency parse tree, and finally judges whether the result satisfies the metamorphic relationship. The experiments are conducted on multi-field Chinese-English translation datasets and six industrial neural machine translation systems are evaluated, and compared with same type of metamorphic testing and methods based on reference translations. The experimental results show that the proposed method MGMT is 80% and 20% higher than similar methods in terms of Pearson's correlation coefficient and Spearman's correlation coefficient respectively. This indicates that the non-reference translation evaluation method proposed in this study has a higher positive correlation with the reference translation based evaluation method, which verifies that MGMT's evaluation accuracy is significantly better than other methods of the same type.

Key words:neural network;machine translation;quality estimation;metamorphic test;multi-granularity

引用本文

钟文康,葛季栋,陈翔,李传艺,唐泽,骆斌.面向神经机器翻译系统的多粒度蜕变测试.软件学报,2021,32(4):1051-1066

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2020-09-12
最后修改日期:2020-10-26
录用日期:
在线发布日期: 2021-01-22
出版日期: 2021-04-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码