Abstract:Deep learning has made great achievements in various fields such as computer vision, natural language processing, speech recognition, and other fields. Compared with traditional machine learning algorithms, deep models have higher accuracy on many tasks. Because deep learning is an end-to-end, highly non-linear, and complex model, the interpretability of deep models is not as good as traditional machine learning algorithms, which brings certain obstacles to the application of deep learning in real life. It is of great significance and necessary to study the interpretability of depth model, and in recent years many scholars have proposed different algorithms on this issue. For image classification tasks, this study divides the interpretability algorithms into global interpretability and local interpretability algorithms. From the perspective of interpretation granularity, global interpretability algorithms are further divided into model-level and neuron-level interpretability algorithms, and local interpretability algorithms are divided into pixel-level features, concept-level features, and image-level feature interpretability algorithms. Based on the above framework, this study mainly summarizes the common deep model interpretability research algorithms and related evaluation indicators, and discusses the current challenges and future research directions for deep model interpretability research. It is believed that conducting research on the interpretability and theoretical foundation of deep model is a necessary way to open the black box of the deep model, and interpretability algorithms have huge potential to provide help for solving other problems of deep models, such as fairness and generalization.