基于代码自然性的切片粒度缺陷预测方法
作者:
作者单位:

作者简介:

张献(1990-),男,博士,讲师,主要研究领域为软件质量保障,机器学习.
曾杰(1993-),男,博士,主要研究领域为软件分析,机器学习.
贲可荣(1963-),男,博士,教授,博士生导师,CCF杰出会员,主要研究领域为软件工程,人工智能.

通讯作者:

张献,E-mail:tomtomzx@foxmail.com

中图分类号:

基金项目:

国家安全重大基础研究计划(613315)


Code Naturalness Based Defect Prediction Method at Slice Level
Author:
Affiliation:

Fund Project:

National Security Program on Key Basic Research Project of China (613315)

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    软件缺陷预测是软件质量保障领域的一个活跃话题,它可以帮助开发人员发现潜在的缺陷并更好地利用资源.如何为预测系统设计更具判别力的度量元,并兼顾性能与可解释性,一直是人们致力于研究的方向.针对这一挑战,提出了一种基于代码自然性特征的缺陷预测方法——CNDePor.该方法通过正逆双向度量代码并利用质量信息对样本加权的方式改进语言模型,提高了模型所得交叉熵(CE)类度量元的缺陷判别力.针对粗粒度缺陷预测存在难以聚焦缺陷区域、代码审查成本高的不足,研究了一种新的细粒度缺陷预测问题——面向语句的切片级缺陷预测.在该问题上,设计了4种度量元,并在两类安全缺陷数据集上验证了度量元和CNDePor方法的有效性.实验结果表明,CE类度量元具有可学习性,它们蕴涵了语言模型从语料库中学习到的相关知识;改进的CE类度量元的判别力明显优于原始度量元和传统规模度量元;CNDePor方法较传统缺陷预测方法和已有的基于代码自然性的方法有显著优势,较先进的基于深度学习的方法具有可比性和更强的可解释性.

    Abstract:

    Software defect prediction is an active research topic in the domain of software quality assurance. It can help developers find potential defects and make better use of resources. How to design more discriminative metrics for the prediction system, taking into account performance and interpretability, has always been a research direction that people devote to. Aiming at this challenge, a code naturalness feature based defect predictor method (CNDePor) is proposed. This method improves the language model by taking advantage of the bidirectional code-sequence measurement and weighting the samples by using the quality information, so as to increase the defect discrimination of the cross-entropy (CE) type metrics obtained from the model. Aiming at the shortcomings of coarse-grained defect prediction (e.g. difficulties in focusing on defect areas and high cost of code reviews), a new fine-grained defect prediction problem, statement-oriented slice level defect prediction, is studied. Four metrics are designed for this problem, and the effectiveness of these metrics and CNDePor are verified on two types of security defect datasets. The experimental results show that:CE-type metrics are learnable, which contain the relevant knowledge learned from the corpus by language model; the improved CE metrics are significantly better than the original metrics and traditional size metrics; the CNDePor method has significant advantages over the traditional defect prediction methods and an existing method based on code naturalness, and is of comparable performance and stronger interpretability than a state-of-the-art mothed based on deep learning.

    参考文献
    相似文献
    引证文献
引用本文

张献,贲可荣,曾杰.基于代码自然性的切片粒度缺陷预测方法.软件学报,2021,32(7):2219-2241

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2020-09-13
  • 最后修改日期:2020-10-26
  • 录用日期:
  • 在线发布日期: 2021-01-22
  • 出版日期: 2021-07-06
您是第位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号