Abstract:With the development of the amount of data available for training and the processing power of new computing platform, the intelligent model based on deep learning can accomplish more and more complex tasks, and it has made major breakthroughs in the field of AI such as computer vision and natural language processing. However, the large number of parameters of these deep models bring awesome computational overhead and memory requirements, which makes the big models must face great difficulties and challenges in the deployment of computing-capable platforms (such as mobile embedded devices). Therefore, model compression and acceleration without affecting the performance have become a research hotspot. This study first analyzes the classical deep learning model compression and acceleration methods proposed by domestic and international scholars, and summarize seven aspects:Parameter pruning, parameter quantization, compact network, knowledge distillation, low-rank decomposition, parameter sharing, and hybrid methods. Secondly, the compression and acceleration performance of several mainstream representative methods is compared on multiple public models. Finally, the future research directions in the field of model compression and acceleration are discussed.