Abstract:Although convolutional neural networks (CNNs) are widely used in image recognition due to their excellent generalization performance, adversarial samples contaminated by noise can easily deceive fully trained network models, posing security risks. Many existing defense methods improve the robustness of models, but most inevitably sacrifice model generalization. To alleviate this issue, a label-filtered weight parameter regularization method is proposed to balance the generalization and robustness of models using the label information of samples during model training. Many previous robust model training methods suffer from two main issues: 1) The robustness of models is mainly enhanced by increasing the quantity or complexity of training set samples, which not only diminishes the dominant role of clean samples in model training but also significantly increases the workload of training tasks. 2) The label information of samples is used only to compare with model predictions to control the direction of model parameter updates, neglecting the additional information hidden in sample labels. The proposed method selects weight parameters that play a decisive role in classifying samples by filtering the correct labels of samples and the classification labels of adversarial samples and optimizes these parameters regularly to achieve a balance between model generalization and robustness. Experiments and analysis on the MNIST, CIFAR-10, and CIFAR-100 datasets demonstrate that the proposed method achieves good training results.