[关键词]
[摘要]
近年来的研究表明,对抗训练是一种有效的防御对抗样本攻击的方法.然而,现有的对抗训练策略在提升模型鲁棒性的同时会造成模型的泛化能力下降.现阶段主流的对抗训练方法通常都是独立地处理每个训练样本,而忽略了样本之间的关系,这使得模型无法充分挖掘样本间的几何关系来学习更鲁棒的模型,以便更好地防御对抗攻击.因此,重点研究如何在对抗训练过程中保持样本间的几何结构稳定性,达到提升模型鲁棒性的目的.具体而言,在对抗训练中,设计了一种新的几何结构约束方法,其目的是保持自然样本与对抗样本的特征空间分布一致性.此外,提出了一种基于双标签的监督学习方法,该方法同时采用自然样本和对抗样本的标签对模型进行联合监督训练.最后,分析了双标签监督学习方法的特性,试图从理论上解释对抗样本的工作机理.多个基准数据集上的实验结果表明:相比于已有方法,该方法有效地提升了模型的鲁棒性且保持了较好的泛化精度.相关代码已经开源:https://github.com/SkyKuang/DGCAT.
[Key word]
[Abstract]
Recent studies have shown that adversarial training is an effective method to defend against adversarial example attacks. However, such robustness comes with a price of a larger generalization gap. To this end, existing endeavors mainly treat each training example independently, which ignores the geometry relationship between inter-samples and does not take the defending capability to the full potential. Different from existing works, this study focuses on improving the robustness of the neural network model by aligning the geometric information of inter-samples to make the feature spatial distribution structure between the natural and adversarial samples is consistent. Furthermore, a dual-label supervised method is proposed to leverage true and wrong labels of adversarial example to jointly supervise the adversarial learning process. The characteristics of the dual-label supervised learning method are analyzed and it is tried to explain the working mechanism of the adversarial example theoretically. The extensive experiments have been conducted on benchmark datasets, which well demonstrates that the proposed approach effectively improves the robustness of the model and still keeps the generalization accuracy. Code is available: https://github.com/SkyKuang/DGCAT.
[中图分类号]
[基金项目]
国家杰出青年科学基金(62025603);国家自然科学基金(U1705262,62072386,62072387,62072389,62002305,61772443,61802324 61702136);广东省基础与应用基础研究基金(2019B1515120049);中央高校基本科研业务费(20720200077,20720200090,20720200091)