邓枭(1995-),男,博士生,主要研究领域为缺陷自动检测;叶蔚(1985-),男,博士,副研究员,主要研究领域为自然语言处理,程序语言理解,软件安全;谢睿(1991-),男,博士,助理研究员,主要研究领域为程序语言理解,缺陷自动检测;张世琨(1969-),男,博士,研究员,博士生导师,CCF高级会员,主要研究领域为知识计算,软件工程,软件安全
叶蔚,wye@pku.edu.cn
源代码缺陷检测是判别程序代码中是否存在非预期行为的过程,广泛应用于软件测试、软件维护等软件工程任务,对软件的功能保障与应用安全方面具有至关重要的作用.传统的缺陷检测研究以程序分析为基础,通常需要很强的领域知识与复杂的计算规则,面临状态爆炸问题,导致检测性能有限,在误报漏报率上都有较大提高空间.近年来,开源社区的蓬勃发展积累了以开源代码为核心的海量数据,在此背景下,利用深度学习的特征学习能力能够自动学习语义丰富的代码表示,从而为缺陷检测提供一种新的途径.搜集了该领域最新的高水平论文,从缺陷代码数据集与深度学习缺陷检测模型两方面系统地对当前方法进行了归纳与阐述.最后对该领域研究所面临的主要挑战进行总结,并展望了未来可能的研究重点.
Source code bug (vulnerability) detection is a process of judging whether there are unexpected behaviors in the program code. It is widely used in software engineering tasks such as software testing and software maintenance, and plays a vital role in software functional assurance and application security. Traditional vulnerability detection research is based on program analysis, which usually requires strong domain knowledge and complex calculation rules, and faces the problem of state explosion, resulting in limited detection performance, and there is room for greater improvement in the rate of false positives and false negatives. In recent years, the open source community's vigorous development has accumulated massive amounts of data with open source code as the core. In this context, the feature learning capabilities of deep learning can automatically learn semantically rich code representations, thereby providing a new way for vulnerability detection. This study collected the latest high-level papers in this field, systematically summarized and explained the current methods from two aspects:vulnerability code dataset and deep learning vulnerability detection model. Finally, it summarizes the main challenges faced by the research in this field, and looks forward to the possible future research focus.
邓枭,叶蔚,谢睿,张世琨.基于深度学习的源代码缺陷检测研究综述.软件学报,2023,34(2):625-654
复制