[关键词]
[摘要]
深度神经网络训练时可能会受到精心设计的后门攻击的影响. 后门攻击是一种通过在训练集中注入带有后门标志的数据, 从而实现在测试时控制模型输出的攻击方法. 被进攻的模型在干净的测试集上表现正常, 但在识别到后门标志后, 就会被误判为目标进攻类. 当下的后门攻击方式在视觉上的隐蔽性并不够强, 并且在进攻成功率上还有提升空间. 为了解决这些局限性, 提出基于奇异值分解的后门攻击方法. 所提方法有两种实现形式: 第1种方式是将图片的部分奇异值直接置零, 得到的图片有一定的压缩效果, 这可以作为有效的后门触发标志物. 第2种是把进攻目标类的奇异向量信息注入到图片的左右奇异向量中, 也能实现有效的后门进攻. 两种处理得到的后门的图片, 从视觉上来看和原图基本保持一致. 实验表明, 所提方法证明奇异值分解可以有效地利用在后门攻击算法中, 并且能在多个数据集上以非常高的成功率进攻神经网络.
[Key word]
[Abstract]
Deep neural networks can be affected by well-designed backdoor attacks during training. Such attacks are an attack method that controls the model output during tests by injecting data with backdoor labels into the training set. The attacked model performs normally on a clean test set but will be misclassified as the attack target class when the backdoor labels are recognized. The currently available backdoor attack methods have poor invisibility and are still expected to achieve a higher attack success rate. A backdoor attack method based on singular value decomposition is proposed to address the above limitations. The method proposed can be implemented in two ways: One is to directly set some singular values of the picture to zero, and the obtained picture is compressed to a certain extent and can be used as an effective backdoor triggering label. The other is to inject the singular vector information of the attack target class into the left and right singular vectors of the picture, which can also achieve an effective backdoor attack. The backdoor pictures obtained in the two kinds of processing ways are basically the same as the original picture from a visual point of view. According to the experiments, the proposed method proves that singular value decomposition can be effectively leveraged in backdoor attack algorithms to attack neural networks with considerably high success rates on multiple datasets.
[中图分类号]
[基金项目]
国家自然科学基金(61832002,62172094);北京市杰出青年基金(JQ20023)