基于义原级语句稀释法的文本对抗攻击能力强化方法
作者:
作者单位:

作者简介:

叶文滔(1997-),男,硕士,主要研究领域为机器学习,自然语言处理,文本对抗攻击,蜕变测试.;张敏(1977-),女,博士,教授,CCF专业会员,主要研究领域为复杂系统的量化分析与验证,AI系统的测试与分析验证.;陈仪香(1961-),男,博士,教授,CCF杰出会员,主要研究领域为物联网与信息物理融合系统,实时软件系统,软件形式化方法与可信评估,软硬件协同设计与优化技术.

通讯作者:

张敏,E-mail:mzhang@sei.ecnu.edu.cn

中图分类号:

TP309

基金项目:

科技部重点研发项目(2020AAA0107800);国家自然科学基金(61672012)


Enhancement of Textual Adversarial Attack Ability Based on Sememe-level Sentence Dilution Algorithm
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    随着近年来机器学习方法在自然语言处理领域的应用越发广泛,自然语言处理任务的安全性也引起了研究者们重视.现有研究发现,向样本施加细微扰动可能令机器学习模型得到错误结果,这种方法称之为对抗攻击.文本对抗攻击能够有效发现自然语言模型的弱点从而进行改进.然而,目前的文本对抗攻击方法都着重于设计复杂的对抗样本生成策略,对抗攻击成功率提升有限,且对样本进行高侵入性修改容易导致样本质量下降.如何更简单、更高效地提升对抗攻击效果,并输出高质量对抗样本已经成为重要需求.为解决此问题,从改进对抗攻击过程的新角度,设计了义原级语句稀释法(sememe-level sentence dilution algorithm,SSDA)及稀释池构建算法(dilution pool construction algorithm,DPCA).SSDA是一种可以自由嵌入经典对抗攻击过程中的新过程,它利用DPCA构建的稀释池先对输入样本进行稀释,再进行对抗样本生成.在未知文本数据集与自然语言模型的情况下,不仅能够提升任意文本对抗攻击方法的攻击成功率,还能够获得相较于原方法更高的对抗样本质量.通过对不同文本数据集、稀释池规模、自然语言模型,以及多种主流文本对抗攻击方法进行对照实验,验证了SSDA对文本对抗攻击方法成功率的提升效果以及DPCA构建的稀释池对SSDA稀释能力的提升效果.实验结果显示,SSDA稀释过程能够比经典对抗攻击过程发现更多模型漏洞,且DPCA能够帮助SSDA在提升成功率的同时进一步提升对抗样本的文本质量.

    Abstract:

    With machine learning widely applied to the natural language processing (NLP) domain in recent years, the security of NLP tasks receives growing natural concerns. Existing studies found that small modifications in examples might lead to wrong machine learning predictions, which was also called adversarial attack. The textual adversarial attack can effectively reveal the vulnerability of NLP models for improvement. Nevertheless, existing textual adversarial attack methods all focus on designing complex adversarial example generation strategies with a limited improvement of success rate, and the highly invasive modifications bring the decline of textual quality. Thus, a simple and effective method with high adversarial example quality is in demand. To solve this problem, the sememe-level sentence dilution algorithm (SSDA) and the dilution pool construction algorithm (DPCA) are proposed from a new perspective of improving the process of adversarial attack. SSDA is a new process that can be freely embedded into the classical adversarial attack workflow. SSDA first uses dilution pools constructed by DPCA to dilute the original examples, then generates adversarial examples through those diluted examples. It can not only improve the success rate of any adversarial attack methods without any limit of datasets or victim models but also obtain higher adversarial example quality compared with the original method. Through the experiments of different datasets, dilution pools, victim models, and textual adversarial attack methods, it is successfully verified the improvement of SSDA on the success rate and proved that dilution pools constructed by DPCA can further enhance the dilution ability of SSDA. The experiment results demonstrate that SSDA reveals more vulnerabilities of models than classical methods, and DPCA can help SSDA to improve success rate with higher adversarial example quality.

    参考文献
    相似文献
    引证文献
引用本文

叶文滔,张敏,陈仪香.基于义原级语句稀释法的文本对抗攻击能力强化方法.软件学报,2023,34(7):3313-3328

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2021-06-22
  • 最后修改日期:2021-09-22
  • 录用日期:
  • 在线发布日期: 2022-09-09
  • 出版日期: 2023-07-06
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号