Abstract:Nowadays, deep neural networks (DNNs) have been widely used in various fields. However, research has shown that DNNs are vulnerable to attacks of adversarial examples (AEs), which seriously threaten the development and application of DNNs. Most of the existing adversarial defense methods need to sacrifice part of the original classification accuracy to obtain defense capability and strongly rely on the knowledge provided by the generated AEs, so they cannot balance the effectiveness and efficiency of defense. Therefore, based on manifold learning, this study proposes an origin hypothesis of AEs in attackable space from the feature space perspective and a trap-type ensemble adversarial defense network (Trap-Net). Trap-Net adds trap data to the training data based on the original model and uses the trap-type smoothing loss function to establish the seducing relationship between the target data and trap data, so as to generate trap-type networks. In order to address the problem that most adversarial defense methods sacrifice original classification accuracy, ensemble learning is used to ensemble multiple trap networks, so as to expand attackable target space defined by trap labels in the feature space and reduce the loss of the original classification accuracy. Finally, Trap-Net determines whether the input data are AEs by detecting whether the data hit the attackable target space. Experiments on MNIST, K-MNIST, F-MNIST, CIFAR-10, and CIFAR-100 datasets show that Trap-Net has strong defense generalization of AEs without sacrificing the classification accuracy of clean samples, and the results of experiments validate the adversarial origin hypothesis in attackable space. In the low-perturbation white-box attack scenario, Trap-Net achieves a detection rate of more than 85% for AEs. In the high-perturbation white-box attack and black-box attack scenarios, Trap-Net has a detection rate of almost 100% for AEs. Compared with other detection methods of AEs, Trap-Net is highly effective against white-box and black-box adversarial attacks, and it provides an efficient robustness optimization method for DNNs in adversarial environments.