面向预训练漏洞检测模型的黑盒对抗攻击
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP311

基金项目:

智能电网国家科技重大专项(2025ZD0808500); 国家自然科学基金(62372173)


Black-box Adversarial Attacks for Pretrained Vulnerability Detection Models
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    预训练代码模型在代码理解与分析任务中展现出强大的能力, 逐渐成为源代码漏洞检测领域的重要工具与研究热点. 然而, 与传统深度学习模型类似, 预训练代码模型在面对精心构造的对抗代码输入时仍存在鲁棒脆弱性. 攻击者可通过添加语义保持的扰动, 误导模型将含漏洞代码识别为无漏洞代码, 从而威胁软件安全. 因此, 研究针对预训练漏洞检测模型的对抗攻击, 不仅有助于评估预训练代码模型的鲁棒性, 还能够为后续漏洞检测模型设计和防御机制构建提供重要参考. 针对硬标签黑盒攻击场景, 提出一种面向预训练漏洞检测模型的黑盒对抗攻击方法——VulBlurrer. 设计定向同义代码转换策略优先对漏洞邻近的高敏感区域和特定语句进行扰动, 并提出基于特征一致性、语义一致性与代码流畅度的逃逸分数, 以在无需访问目标模型内部信息的前提下量化候选样本的潜在攻击价值. 此外, 采用基于遗传算法的对抗代码优化策略, 通过动态调整逃逸分数计算权重, 并在迭代过程中采用精英保留机制, 进一步提升攻击准确率. 在基于CodeBERT、GraphCodeBERT、CodeT5和UniXcoder的预训练漏洞检测模型上, 对VulBlurrer及基线方法进行性能测试. 结果显示, VulBlurrer在4种目标模型上的攻击成功率分别达到85.51%、91.47%、93.14%和71.61%, 平均查询次数分别为12.67次、9.10次、11.07次和19.44次. 与现有方法相比, VulBlurrer具有更高的攻击成功率, 且在攻击成功率与查询效率之间实现了更好的权衡, 其生成的对抗代码在语义一致性与代码流畅度方面亦表现更优. 进一步地, 在ChatGPT、DeepSeek和基于大语言模型的辅助编程工具GitHub Copilot、TRAE上开展实证研究, 验证了VulBlurrer在大语言模型上的有效性. 因此, 预训练代码模型在漏洞检测任务中仍面临对抗攻击带来的鲁棒性挑战, 基于预训练模型和大语言模型的漏洞检测工具需要进一步提升面对对抗性代码时的鲁棒性.

    Abstract:

    Pretrained code models have demonstrated strong capabilities in code understanding and analysis and have become important tools and major research focuses in source code vulnerability detection. However, similar to traditional deep learning models, pretrained code models exhibit robustness vulnerabilities when exposed to carefully crafted adversarial code. Attackers can mislead the model into classifying vulnerable code as non-vulnerable by introducing semantically preserving perturbations, thus posing a significant threat to software security. Therefore, adversarial attacks for pretrained vulnerability detection models not only serve as an effective approach to evaluate the robustness of pretrained code models but also provide critical insights for the development of future vulnerability detection models and defense mechanisms. In the hard-label black-box attack scenario, this study proposes VulBlurrer, a black-box adversarial attack method targeting pretrained vulnerability detection models. VulBlurrer designs a directed synonymous code transformation strategy, prioritizing perturbations on highly sensitive regions adjacent to vulnerabilities and on specific statements. It also introduces an escape score based on feature consistency, semantic consistency, and code fluency, enabling the quantification of the potential attack value of candidate samples without accessing internal information of the target model. Furthermore, this study adopts a genetic-algorithm-based optimization strategy for adversarial code, in which the weights used to compute the escape score are dynamically adjusted and an elite retention mechanism is applied during the iterative process, thus further improving attack accuracy. VulBlurrer and baseline methods are evaluated on pretrained vulnerability detection models based on CodeBERT, GraphCodeBERT, CodeT5, and UniXcoder. The attack success rates of VulBlurrer on the four target models reach 85.51%, 91.47%, 93.14%, and 71.61%, with average query numbers of 12.67, 9.10, 11.07, and 19.44, respectively. Compared with existing methods, VulBlurrer achieves higher attack success rates and a better trade-off between attack success rate and query efficiency, while the generated adversarial code also exhibit superior consistency and fluency. Furthermore, empirical studies are conducted on ChatGPT, DeepSeek, and LLM-based programming assistants, including GitHub Copilot and TRAE, verifying the effectiveness of the proposed method against large language models. These results indicate that pretrained code models still face robustness challenges posed by adversarial attacks in vulnerability detection tasks, and vulnerability detection tools based on pretrained models and large language models require further improvements to enhance robustness against adversarial code.

    参考文献
    相似文献
    引证文献
引用本文

王霄东,陈郅楷,王一杰,黄荔,关志涛.面向预训练漏洞检测模型的黑盒对抗攻击.软件学报,,():1-21

复制
相关视频

分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2025-11-15
  • 最后修改日期:2025-12-16
  • 录用日期:
  • 在线发布日期: 2026-03-25
  • 出版日期:
文章二维码
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号