Abstract:This study proposes a new feature constrained distillation learning method for visual anomaly detection, which makes full use of the features of the teacher model to instruct the student model to efficiently identify abnormal images. Specifically, the vision transformer (ViT) model is introduced as the backbone network of anomaly detection tasks, and a central feature strategy is put forward to constrain the output features of the student network. Considering the strong feature expressiveness of the teacher network, the central feature strategy is developed to dynamically generate the feature representation centers of normal samples for the student network from the teacher network. In this way, the ability of the student network to describe the feature output of normal data is improved, and the feature difference between the student and teacher networks in abnormal data is widened. In addition, to minimize the difference between the student and teacher networks in the feature representation of normal images, the proposed method leverages the Gram loss function to constrain the relationship between the coding layers of the student network. Experiments are conducted on three general anomaly detection data sets and one real-world industrial anomaly detection data set, and the experimental results demonstrate that the proposed method significantly improves the performance of visual anomaly detection compared with the state-of-the-art methods.