Journal of Software
1000-9825
2023
34
9
4003
4017
10.13328/j.cnki.jos.006878
article
基于稀疏扰动的对抗样本生成方法
Adversarial Example Generation Method Based on Sparse Perturbation
近年来, 深度神经网络(deep neural network, DNN)在图像领域取得了巨大的进展. 然而研究表明, DNN容易受到对抗样本的干扰, 表现出较差的鲁棒性. 通过生成对抗样本攻击DNN, 可以对DNN的鲁棒性进行评估, 进而采取相应的防御方法提高DNN的鲁棒性. 现有对抗样本生成方法依旧存在生成扰动稀疏性不足、扰动幅度过大等缺陷. 提出一种基于稀疏扰动的对抗样本生成方法——SparseAG (sparse perturbation based adversarial example generation), 该方法针对图像样本能够生成较为稀疏并且幅度较小的扰动. 具体来讲, SparseAG方法首先基于损失函数关于输入图像的梯度值迭代地选择扰动点来生成初始对抗样本, 每一次迭代按照梯度值由大到小的顺序确定新增扰动点的候选集, 选择使损失函数值最小的扰动添加到图像中. 其次, 针对初始扰动方案, 通过一种扰动优化策略来提高对抗样本的稀疏性和真实性, 基于每个扰动的重要性来改进扰动以跳出局部最优, 并进一步减少冗余扰动以及冗余扰动幅度. 选取CIFAR-10数据集以及ImageNet数据集, 在目标攻击以及非目标攻击两种场景下对该方法进行评估. 实验结果表明, SparseAG方法在不同的数据集以及不同的攻击场景下均能够达到100%的攻击成功率, 且生成扰动的稀疏性和整体扰动幅度都优于对比方法.
In recent years, deep neural network (DNN) has made great progress in the field of image. However, studies show that DNN is susceptible to the interference of adversarial examples and exhibits poor robustness. By generating adversarial examples to attack DNN, the robustness of DNN can be evaluated, and then corresponding defense methods can be adopted to improve the robustness of DNN. The existing adversarial example generation methods still have some defects, such as insufficient sparsity of generated perturbations, and excessive perturbation magnitude. This study proposes an adversarial example generation method based on sparse perturbation, sparse perturbation based adversarial example generation (SparseAG), which can generate relatively sparse and small-magnitude perturbations for image examples. Specifically, SparseAG first selects the perturbation points iteratively based on the gradient value of the loss function for the input image to generate the initial adversarial example. In each iteration, the candidate set of the new perturbation points is determined in the order of gradient value from large to small values, and the perturbation which makes the value of loss function value smallest is added to the image. Secondly, a perturbation optimization strategy is employed in the initial perturbation scheme to improve the sparsity and authenticity of the adversarial example. The perturbations are improved based on the importance of each perturbation for jumping out of the local optimum, and the redundant perturbation and the redundant perturbation magnitude are further reduced. This study selects the CIFAR-10 dataset and the ImageNet dataset to evaluate the method in the target attack and non-target attack scenarios. The experimental results show that SparseAG can achieve a 100% attack success rate in different datasets and different attack scenarios, and the sparsity and the overall perturbation magnitude of the generated perturbations are better than those of the comparison methods.
深度神经网络;对抗样本生成;稀疏扰动;图像识别;目标攻击;非目标攻击
deep neural network (DNN);adversarial example generation;sparse perturbation;image recognition;target attack;non-target attack
吉顺慧,胡黎明,张鹏程,戚荣志
JI Shun-Hui, HU Li-Ming, ZHANG Peng-Cheng, QI Rong-Zhi
jos/article/abstract/6878