Code Naturalness Based Defect Prediction Method at Slice Level
Author:
Affiliation:

Fund Project:

National Security Program on Key Basic Research Project of China (613315)

  • Article
  • | |
  • Metrics
  • |
  • Reference [76]
  • |
  • Related [20]
  • | | |
  • Comments
    Abstract:

    Software defect prediction is an active research topic in the domain of software quality assurance. It can help developers find potential defects and make better use of resources. How to design more discriminative metrics for the prediction system, taking into account performance and interpretability, has always been a research direction that people devote to. Aiming at this challenge, a code naturalness feature based defect predictor method (CNDePor) is proposed. This method improves the language model by taking advantage of the bidirectional code-sequence measurement and weighting the samples by using the quality information, so as to increase the defect discrimination of the cross-entropy (CE) type metrics obtained from the model. Aiming at the shortcomings of coarse-grained defect prediction (e.g. difficulties in focusing on defect areas and high cost of code reviews), a new fine-grained defect prediction problem, statement-oriented slice level defect prediction, is studied. Four metrics are designed for this problem, and the effectiveness of these metrics and CNDePor are verified on two types of security defect datasets. The experimental results show that:CE-type metrics are learnable, which contain the relevant knowledge learned from the corpus by language model; the improved CE metrics are significantly better than the original metrics and traditional size metrics; the CNDePor method has significant advantages over the traditional defect prediction methods and an existing method based on code naturalness, and is of comparable performance and stronger interpretability than a state-of-the-art mothed based on deep learning.

    Reference
    [1] Ma XX, Liu XZ, Xie B, Yu P, Zhang T, Bu L, Li XD. Software development methods:Review and outlook. Ruan Jian Xue Bao/Journal of Software, 2019,30(1):3-21(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5650.htm[doi:10. 13328/j.cnki.jos.005650]
    [2] Chen X, Gu Q, Liu WS, Liu SL, Ni C. Survey of static software defect prediction. Ruan Jian Xue Bao/Journal of Software, 2016,27(1):1-25(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4923.htm[doi:10.13328/j.cnki.jos.004923]
    [3] Rathore SS, Kumar S. A study on software fault prediction techniques. Artificial Intelligence Review, 2017, 1-73.
    [4] Hosseini S, Turhan B, Gunarathna D. A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans. on Software Engineering, 2019,45(2):111-147.
    [5] Cai L, Fan YR, Yan M, Xia X. Just-in-time software defect prediction:Literature review. Ruan Jian Xue Bao/Journal of Software, 2019,30(5):1288-1307(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5713.htm[doi:10.13328/j.cnki.jos. 005713]
    [6] Gong LN, Jiang SJ, Jiang L. Research progress of software defect prediction. Ruan Jian Xue Bao/Journal of Software, 2019,30(10):3090-3114(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5790.htm[doi:10.13328/j.cnki.jos.005790]
    [7] Radjenović D, Heričko M, Torkar R, Živkovič, A. Software fault prediction metrics:A systematic literature review. Information and Software Technology, 2013,55(8):1397-1418.
    [8] Dam HK, Tran T, Ghose A. Explainable software analytics. In:Proc. of the 40th Int'l Conf. on Software Engineering:New Ideas and Emerging Results. New York:ACM Press, 2018. 53-56.
    [9] Du M, Liu N, Hu X. Techniques for interpretable machine learning. Communications of the ACM, 2020,63(1):68-77.
    [10] Wan Z, Xia X, Hassan AE, Lo D, Yin J, Yang X. Perceptions, expectations, and challenges in defect prediction. IEEE Trans. on Software Engineering, 2018.[doi:10.1109/TSE.2018.2877678]
    [11] Kamei Y, Shihab E, Adams B, Hassan AE, Mockus A, Sinha A, Ubayashi N. A large-scale empirical study of just-in-time quality assurance. IEEE Trans. on Software Engineering, 2013,39(6):757-773.
    [12] Miltiadis A, Barr ET, Premkumar D, Sutton C. A survey of machine learning for big code and naturalness. ACM Computing Surveys, 2018,51(4):1-37.
    [13] Manning CD. Foundations of Statistical Natural Language Processing. Massachusetts:MIT Press, 1999.
    [14] Ray B, Hellendoorn V, Godhane S, Tu Z, Bacchelli A, Devanbu P. On the naturalness of buggy code. In:Proc. of the 38th Int'l Conf. on Software Engineering. New York:ACM Press, 2016. 428-439.
    [15] Hindle A, Barr ET, Su Z, Gabel M, Devanbu P. On the naturalness of software. In:Proc. of the 34th Int'l Conf. on Software Engineering. Piscataway:IEEE Press, 2012. 837-847.
    [16] Hindle A, Barr ET, Gabel M, Su Z, Devanbu P. On the naturalness of software. Communications of the ACM, 2016,59(5):122-131.
    [17] Tip F. A survey of program slicing techniques. Journal of Programming Languages, 1995,3(3):1-65.
    [18] Devanbu P. New initiative:The naturalness of software. In:Proc. of the 37th Int'l Conf. on Software Engineering. Piscataway:IEEE Press, 2015. 543-546.
    [19] Tu Z, Su Z, Devanbu P. On the localness of software. In:Proc. of the 22nd ACM SIGSOFT Int'l Symp. on Foundations of Software Engineering. New York:ACM Press, 2014. 269-280.
    [20] Franks C, Tu Z, Devanbu P, Hellendoorn V. Cacheca:A cache language model based code suggestion tool. In:Proc. of the 37th Int'l Conf. on Software Engineering-Volume 2. Piscataway:IEEE Press, 2015. 705-708.
    [21] Campbell JC, Hindle A, Amaral JN. Syntax errors just aren't natural:Improving error reporting with language models. In:Proc. of the 11th Working Conf. on Mining Software Repositories. New York:ACM Press, 2014. 252-261.
    [22] Jimenez M. Evaluating vulnerability prediction models[Ph.D. Thesis]. Luxembourg:University of Luxembourg, 2018. https://www.researchgate.net/publication/328215078
    [23] Jimenez M, Maxime C, LeTraon Y, Papadakis M. On the impact of tokenizer and parameters on n-gram based code analysis. In:Proc. of the 34th Int'l Conf. on Software Maintenance and Evolution. Piscataway:IEEE Press, 2018. 437-448.
    [24] Li BX. Program slicing techniques and its application in object-oriented software metrics and software test[Ph.D. Thesis]. Nanjing:University of Nanjing, 2000(in Chinese with English abstract).
    [25] Pan K, Kim S, WhiteheadJr EJ. Bug classification using program slicing metrics. In:Proc. of the 6th Int'l Workshop on Source Code Analysis and Manipulation. Piscataway:IEEE Press, 2006. 31-42.
    [26] Black S, Counsell S, Hall T, Wernick P. Using program slicing to identify faults in software. In:Proc. of the Beyond Program Slicing. 2006.
    [27] Black S, Counsell S, Hall T, Bowes D. Fault analysis in OSS based on program slicing metrics. In:Proc. of the 35th Euromicro Conf. on Software Engineering and Advanced Applications. 2009. 3-10.
    [28] Yang Y, Zhou Y, Lu H, Chen L, Chen Z, Xu B. Are slice-based cohesion metrics actually useful in effort-aware post-release fault-proneness prediction? An empirical study. IEEE Trans. on Software Engineering, 2015,41(4):331-357.
    [29] Wang J. Software defect prediction using program slicing[MS. Thesis]. Shanghai:Shanghai Jiaotong University, 2014(in Chinese with English abstract).
    [30] Li Z, Zou D, Xu S, Ou X, Jin H, Wang S. VulDeePecker:A deep learning-based system for vulnerability detection. In:Proc. of the Network and Distributed System Security Symp. 2018.
    [31] Malhotra R. A systematic review of machine learning techniques for software fault prediction. Applied Soft Computing, 2015,27:504-518.
    [32] Yu Q, Jiang SJ, Zhang YM, Wang XY, Gao PF, Qian JY. The impact study of class imbalance on the performance of software defect prediction models. Chinese Journal of Computers, 2018(4):809-824(in Chinese with English abstract).
    [33] Li ZQ, Jing XY, Zhu XK, Zhang HY, Xu BW, Ying S. On the multiple sources and privacy preservation issues for heterogeneous defect prediction. IEEE Trans. on Software Engineering, 2019,45(4):391-411.
    [34] Laradji IH, Alshayeb M, Ghouti L. Software defect prediction using ensemble learning on selected features. Information and Software Technology, 2015,58:388-402.
    [35] Jing XY, Ying S, Zhang ZW, Wu SS, Liu J. Dictionary learning based software defect prediction. In:Proc. of the 36th Int'l Conf.on Software Engineering. New York:ACM Press, 2014. 414-423.
    [36] Nam J, Pan SJ, Kim S. Transfer defect learning. In:Proc. of the 35th Int'l Conf. on Software Engineering. Piscataway:IEEE Press, 2013. 382-391.
    [37] Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K. Automated parameter optimization of classification techniques for defect prediction models. In:Proc. of the 38th Int'l Conf. on Software Engineering. Piscataway:IEEE Press, 2016. 321-332.
    [38] Yang X, Lo D, Xia X, Zhang Y, Sun J. Deep learning for just-in-time defect prediction. In:Proc. of the Int'l Conf. on Software Quality, Reliability and Security. Piscataway:IEEE Press, 2015. 17-26.
    [39] Wen M, Wu R, Cheung SC. How well do change sequences predict defects? Sequence learning from software changes. IEEE Trans. on Software Engineering, 2018.[doi:10.1109/TSE.2018.2876256]
    [40] Clemente CJ, Jaafar F, Malik Y. Is predicting software security bugs using deep learning better than the traditional machine learning algorithms? In:Proc. of the Int'l Conf. on Software Quality, Reliability and Security. Piscataway:IEEE Press, 2018. 95-102.
    [41] Wang S, Liu T, Tan L. Automatically learning semantic features for defect prediction. In:Proc. of the 38th Int'l Conf. on Software Engineering. Piscataway:IEEE Press, 2016. 297-308.
    [42] Wang S, Liu T, Nam J, Tan L. Deep semantic feature learning for software defect prediction. IEEE Trans. on Software Engineering, 2018.[doi:10.1109/TSE.2018.2877612]
    [43] Dam HK, Pham T, Ng SW, Tran T, Grundy J, Ghose A. A deep tree-based model for software defect prediction. arXiv Preprint:1802.00921, 2018.
    [44] Li J, He P, Zhu J, Lyu MR. Software defect prediction via convolutional neural network. In:Proc. of the Int'l Conf. on Software Quality, Reliability and Security. Piscataway:IEEE Press, 2017. 318-328.
    [45] Phan AV, LeNguyen M. Convolutional neural networks on assembly code for predicting software defects. In:Proc. of the 21st Asia Pacific Symp. on Intelligent and Evolutionary Systems. Piscataway:IEEE Press, 2017. 37-42.
    [46] Dong F, Wang J, Li Q, Xu GA, Zhang SD. Defect prediction in Android binary executables using deep neural network. Wireless Personal Communications, 2018,102(3):2261-2285.
    [47] Dong F, Liu TM, Xu GA, Guo YH, Li CZ. Defect prediction method for android binary files. Journal of Beijing University of Posts and Telecommunications, 2018,41(1):13-23(in Chinese with English abstract).
    [48] Zhang X, Ben KR, Zeng Jie. Cross-entropy:A new metric for software defect prediction. In:Proc. of the 18th IEEE Int'l Conf. on Software Quality, Reliability and Security. Lisbon:Piscataway:IEEE Press, 2018. 111-122.
    [49] Zhang X, Ben KR, Zeng J. Using cross-entropy value of code for better defect prediction. Int'l Journal of Performability Engineering, 2018,14(9):2105-2115.
    [50] Zhang X. Research on the language model based code analysis and defect prediciton method[Ph.D. Thesis]. Wuhan:Naval University of Engineering, 2019(in Chinese with English abstract). https://github.com/TOM-ZXian/Research-on-the-Language-Model-Based-Code-Analysis-and-Defect-Prediciton-Method-
    [51] Zaremba W, Sutskever I, Vinyals O. Recurrent neural network regularization. arXiv Preprint arXiv:1409.2329, 2014.
    [52] Ji SL, Li JF, Du TY, Li Bo. Surveyon techniques, applications and security of machine learning interpretability. Journal of Computer Research and Development, 2019,56(10):2071-2096(in Chinese with English abstract).
    [53] Zhang X, Ben KR. A neural language model with a modified attention mechanism for software code. In:Proc. of the 9th IEEE Int'l Conf. on Software Engineering and Service Science. Beijing, 2018. 232-236.
    [54] Yang B, Zhang N, Li SP, Xia X. Survey of intelligent code completion. Ruan Jian Xue Bao/Journal of Software, 2020,31(5):1435-1453(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5966.htm[doi:10.13328/j.cnki.jos.005966]
    [55] Agrawal A, Menzies T. Is ‘better data’ better than ‘better data miners’?:On the benefits of tuning SMOTE for defect prediction. In:Proc. of the 40th Int'l Conf. on Software Engineering. New York:ACM Press, 2018. 1050-1061.
    [56] Nam J, Fu W, Kim S, Menzies T, Lin T. Heterogeneous defect prediction. IEEE Trans. on Software Engineering, 2018,44(9):874-896.
    [57] Chandrashekar G, Sahin F. A survey on feature selection methods. Computers & Electrical Engineering, 2014,40(1):16-28.
    [58] Xu Z, Liu J, Yang Z, An G, Jia X. The impact of feature selection on defect prediction performance:An empirical comparison. In:Proc. of the 27th Int'l Symp. on Software Reliability Engineering. Piscataway:IEEE Press, 2016. 309-320.
    [59] Ghotra B, McIntosh S, Hassan AE. A large-scale study of the impact of feature selection techniques on defect classification models. In:Proc. of the 14th Int'l Conf. on Mining Software Repositories. Piscataway:IEEE Press, 2017. 146-157.
    [60] Chen X, Wang LP, Gu Q, Wang Z, Ni C, Liu WS, Wang QP. A survey on cross-project software defect prediction methods. Chinese Journal of Computers, 2018,41(1):254-274(in Chinese with English abstract).
    [61] Chen S, Ye JM, Liu T. Domain adaptation approach for cross-project software defect prediction. Ruan Jian Xue Bao/Journal of Software, 2020,31(2):266-281(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5632.htm[doi:10.13328/j.cnki. jos.005632]
    [62] Zhou Y, Yang Y, Lu H, Chen L, Li YH, Zhao YY, Qian JY, Xu BW. How far we have progressed in the journey? An examination of cross-project defect prediction. ACM Trans. on Software Engineering and Methodology, 2018,27(1):1-51.
    附中文参考文献:
    [1] 马晓星,刘譞哲,谢冰,余萍,张天,卜磊,李宣东.软件开发方法发展回顾与展望.软件学报,2019,30(1):3-21. http://www.jos.org.cn/1000-9825/5650.htm[doi:10.13328/j.cnki.jos.005650]
    [2] 陈翔,顾庆,刘望舒,刘树龙,倪超.静态软件缺陷预测方法研究.软件学报,2016,27(1):1-25. http://www.jos.org.cn/1000-9825/4923.htm[doi:10.13328/j.cnki.jos.004923]
    [5] 蔡亮,范元瑞,鄢萌,夏鑫.即时软件缺陷预测研究进展.软件学报,2019,30(5):1288-1307. http://www.jos.org.cn/1000-9825/5713.htm[doi:10.13328/j.cnki.jos.005713]
    [6] 宫丽娜,姜淑娟,姜丽.软件缺陷预测技术研究进展.软件学报,2019,30(10):3090-3114. http://www.jos.org.cn/1000-9825/5790.htm[doi:10.13328/j.cnki.jos.005790]
    [24] 李必信.程序切片技术及其在面向对象软件度量和软件测试中的应用[博士学位论文].南京:南京大学,2000.
    [29] 王俊.基于程序切片的软件缺陷预测[硕士学位论文].上海:上海交通大学,2014.
    [32] 于巧,姜淑娟,张艳梅,王兴亚,高鹏飞,钱俊彦.分类不平衡对软件缺陷预测模型性能的影响研究.计算机学报,2018(4):809-824.
    [47] 董枫,刘天铭,徐国爱,郭燕慧,李承泽.面向Android二进制代码的缺陷预测方法.北京邮电大学学报,2018,41(1):13-23.
    [50] 张献.基于语言模型的代码分析及缺陷预测方法研究[博士学位论文].武汉:海军工程大学,2019. https://github.com/TOM-ZXian/Research-on-the-Language-Model-Based-Code-Analysis-and-Defect-Prediciton-Method-
    [52] 纪守领,李进锋,杜天宇,李博.机器学习模型可解释性方法、应用与安全研究综述.计算机研究与发展,2019,56(10):2071-2096.
    [54] 杨博,张能,李善平,夏鑫.智能代码补全研究综述.软件学报,2020,31(5):1435-1453. http://www.jos.org.cn/1000-9825/5966.htm[doi:10.13328/j.cnki.jos.005966]
    [60] 陈翔,王莉萍,顾庆,王赞,倪超,刘望舒,王秋萍.跨项目软件缺陷预测方法研究综述.计算机学报,2018,41(1):254-274.
    [61] 陈曙,叶俊民,刘童.一种基于领域适配的跨项目软件缺陷预测方法.软件学报,2020,31(2):266-281. http://www.jos.org.cn/1000-9825/5632.htm[doi:10.13328/j.cnki.jos.005632]
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

张献,贲可荣,曾杰.基于代码自然性的切片粒度缺陷预测方法.软件学报,2021,32(7):2219-2241

Copy
Share
Article Metrics
  • Abstract:2602
  • PDF: 6378
  • HTML: 3367
  • Cited by: 0
History
  • Received:September 13,2020
  • Revised:October 26,2020
  • Online: January 22,2021
  • Published: July 06,2021
You are the first2034574Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063