基于可攻击空间假设的陷阱式集成对抗防御网络
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金(61876138, 62272387, 62141208);国家重点研发计划(2020YFC0833105Z1);西安市重点产业链人工智能核心技术攻关项目(2022JH-RGZN-0028);陕西省重点研发计划(2023-YBGY-030);西安邮电大学创新基金(CXJJZL2021007)


Trap-type Ensemble Adversarial Defense Network Based on Attackable Space Hypothesis
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    如今, 深度神经网络在各个领域取得了广泛的应用. 然而研究表明, 深度神经网络容易受到对抗样本的攻击, 严重威胁着深度神经网络的应用和发展. 现有的对抗防御方法大多需要以牺牲部分原始分类精度为代价, 且强依赖于已有生成的对抗样本所提供的信息, 无法兼顾防御的效力与效率. 因此基于流形学习, 从特征空间的角度提出可攻击空间对抗样本成因假设, 并据此提出一种陷阱式集成对抗防御网络Trap-Net. Trap-Net在原始模型的基础上向训练数据添加陷阱类数据, 使用陷阱式平滑损失函数建立目标数据类别与陷阱数据类别间的诱导关系以生成陷阱式网络. 针对原始分类精度损失问题, 利用集成学习的方式集成多个陷阱式网络以在不损失原始分类精度的同时, 扩大陷阱类标签于特征空间所定义的靶标可攻击空间. 最终, Trap-Net通过探测输入数据是否命中靶标可攻击空间以判断数据是否为对抗样本. 基于MNIST、K-MNIST、F-MNIST、CIFAR-10和CIFAR-100数据集的实验表明, Trap-Net可在不损失干净样本分类精确度的同时具有很强的对抗样本防御泛化性, 且实验结果验证可攻击空间对抗成因假设. 在低扰动的白盒攻击场景中, Trap-Net对对抗样本的探测率高达85%以上. 在高扰动的白盒攻击和黑盒攻击场景中, Trap-Net对对抗样本的探测率几乎高达100%. 与其他探测式对抗防御方法相比, Trap-Net对白盒和黑盒对抗攻击皆有很强的防御效力. 为对抗环境下深度神经网络提供一种高效的鲁棒性优化方法.

    Abstract:

    Nowadays, deep neural networks (DNNs) have been widely used in various fields. However, research has shown that DNNs are vulnerable to attacks of adversarial examples (AEs), which seriously threaten the development and application of DNNs. Most of the existing adversarial defense methods need to sacrifice part of the original classification accuracy to obtain defense capability and strongly rely on the knowledge provided by the generated AEs, so they cannot balance the effectiveness and efficiency of defense. Therefore, based on manifold learning, this study proposes an origin hypothesis of AEs in attackable space from the feature space perspective and a trap-type ensemble adversarial defense network (Trap-Net). Trap-Net adds trap data to the training data based on the original model and uses the trap-type smoothing loss function to establish the seducing relationship between the target data and trap data, so as to generate trap-type networks. In order to address the problem that most adversarial defense methods sacrifice original classification accuracy, ensemble learning is used to ensemble multiple trap networks, so as to expand attackable target space defined by trap labels in the feature space and reduce the loss of the original classification accuracy. Finally, Trap-Net determines whether the input data are AEs by detecting whether the data hit the attackable target space. Experiments on MNIST, K-MNIST, F-MNIST, CIFAR-10, and CIFAR-100 datasets show that Trap-Net has strong defense generalization of AEs without sacrificing the classification accuracy of clean samples, and the results of experiments validate the adversarial origin hypothesis in attackable space. In the low-perturbation white-box attack scenario, Trap-Net achieves a detection rate of more than 85% for AEs. In the high-perturbation white-box attack and black-box attack scenarios, Trap-Net has a detection rate of almost 100% for AEs. Compared with other detection methods of AEs, Trap-Net is highly effective against white-box and black-box adversarial attacks, and it provides an efficient robustness optimization method for DNNs in adversarial environments.

    参考文献
    相似文献
    引证文献
引用本文

孙家泽,温苏雷,郑炜,陈翔.基于可攻击空间假设的陷阱式集成对抗防御网络.软件学报,2024,35(4):1861-1884

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-02-17
  • 最后修改日期:2022-05-26
  • 录用日期:
  • 在线发布日期: 2023-06-28
  • 出版日期: 2024-04-06
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号