Abstract:The study of code naturalness is one of the common research hotspots in the field of natural language processing and software engineering, aiming to solve various software engineering tasks by building a code naturalness model based on natural language processing techniques. In recent years, as the size of source code and data in the open source software community continues to grow, more and more researchers are focusing on the information contained in the source code, and a series of research results have been achieved. While at the same time, code naturalness research faces many challenges in code corpus construction, model building, and task application. In view of this, this paper reviews and summarizes the progress of code naturalness research and application in recent years in terms of code corpus construction, model construction, and task application. The main contents include:(1) Introducing the basic concept of code naturalness and its research overview; (2) The current corpus of code naturalness research is summarized, and the modeling methods for code naturalness are classified and summarized; (3) Summarizing the experimental validation methods and model evaluation metrics of code naturalness models; (4) Summarizing and categorizing the current application status of code naturalness; (5) Summarizing the key issues of code naturalness techniques; (6) Prospecting the future development of code naturalness techniques.