一种采用对抗学习的跨项目缺陷预测方法
作者:
作者简介:

邢颖(1978-),女,博士,副教授,博士生导师,CCF高级会员,主要研究领域为软件测试,人工智能的应用;章世豪(1999-),男,硕士生,CCF学生会员,主要研究领域为深度学习及其钱晓萌(1997-),女,硕士生,CCF学生会员,主要研究领域为软件缺陷预测,深度学习;应用;赵梦赐(1999-),男,硕士生,主要研究领域为深度学习及其应用;管宇(1998-),男,硕士生,主要研究领域为深度学习,软件测试;林婉婷(1997-),女,硕士生,CCF学生会员,主要研究领域为机器学习,软件缺陷预测

通讯作者:

邢颖,E-mail:xingying@bupt.edu.cn

中图分类号:

TP311

基金项目:

国家自然科学基金(61702044);国家重点研发计划课题(2017YFD0401001)


Cross-project Defect Prediction Method Using Adversarial Learning
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [47]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    跨项目缺陷预测(cross-project defect prediction,CPDP)已经成为软件工程数据挖掘领域的一个重要研究方向,它利用其他项目的缺陷代码来建立预测模型,解决了模型构建过程中的数据不足问题.然而源项目和目标项目的代码文件之间存在着数据分布的差异,导致跨项目预测效果不佳.基于生成式对抗网络(generative adversarial network,GAN)中的对抗学习思想,在鉴别器的作用下,通过改变目标项目特征的分布,使其接近于源项目特征的分布,从而提升跨项目缺陷预测的性能.具体来说,提出的抽象连续生成式对抗网络(abstract continuous generative adversarial network,AC-GAN)方法包括数据处理和模型构建两个阶段:(1)首先将源项目和目标项目的代码转换为抽象语法树(abstract syntax tree,AST)的形式,然后以深度优先方式遍历抽象语法树得出节点序列,再使用连续词袋模型(continuous bag-of-words model,CBOW)生成词向量,依据词向量表将节点序列转化为数值向量;(2)处理后的数值向量被送入基于GAN网络结构的模型进行特征提取和数据迁移,然后使用二分类器来判断目标项目代码文件是否有缺陷.AC-GAN方法在15组源-目标项目对上进行了对比实验,实验结果表明了该方法的有效性.

    关键

    Abstract:

    Cross-project defect prediction (CPDP) has become an important research direction in data mining of software engineering, which uses the defective codes of other projects to build prediction models and solves the problem of insufficient data in the process of model construction. Nevertheless, there is difference in data distribution between the code files of source and target projects, which leads to poor cross-project prediction results. Based on the adversarial learning idea of generative adversarial network (GAN), under the action of discriminator, the distribution of target project features can be changed to make it close to the distribution of source project features, so as to improve the performance of cross-project defect prediction. Specifically, the process of the proposed abstract continuous GAN (AC-GAN) method consists of two stages:Data processing and model construction. First, the source and target project codes are converted into the form of abstract syntax trees (ASTs), and then the ASTs are traversed in a depth-first manner to derive the token sequences. The continuous bag-of-words model (CBOW) is used to generate word vectors, and the token sequences are transformed into numeric vectors based on the word vector table. Second, the processed numeric vectors are fed into a GAN structure-based model for feature extraction and data migration. Finally, a binary classifier is used to determine whether the target project code files are defective or not. The AC-GAN method conducted comparison experiments on 15 sets of source-target project pairs, and the experimental results demonstrate the effectiveness of this method.

    参考文献
    [1] Gray J. Why do computers stop and what can be done about it?In:Proc. of the Symp. on Reliability in Distributed Software& Database Systems. 1986.[doi:10.1039/9781847559319-FP007]
    [2] Hall T, Beecham S, Bowes D, Gray D, Counsell S. A systematic literature review on fault prediction performance in software engineering. IEEE Trans. on Software Engineering, 2011, 38(6):1276-1304.[doi:10.1109/TSE.2011.103]
    [3] Punitha K, Chitra S. Software defect prediction using software metrics-A survey. In:Proc. of the Int'l Conf. on Information Communication& Embedded Systems. IEEE, 2013. 555-558.[doi:10.1109/ICICES.2013.6508369]
    [4] Yang X, Lo D, Xia X, Yun Z, Sun J. Deep learning for just-in-time defect prediction. In:Proc. of the 2015 IEEE Int'l Conf. on Software Quality, Reliability and Security. IEEE, 2015. 17-26.[doi:10.1109/QRS.2015.14]
    [5] Wang S, Liu T, Tan L. Automatically learning semantic features for defect prediction. In:Proc. of the 38th IEEE/ACM Int'l Conf. on Software Engineering (ICSE). IEEE, 2016. 297-308.[doi:10.1145/2884781.2884804]
    [6] Qiao L, Li G, Yu D, Liu H. Deep feature learning to quantitative prediction of software defects. In:Proc. of the 45th IEEE Annual Computers, Software, and Applications Conf.(COMPSAC). 2021. 1401-1402.[doi:10.1109/COMPSAC51774.2021.00204]
    [7] Jones J. Abstract syntax tree implementation idioms. 2003. https://www.researchgate.net/publication/245153015_Abstract_Syntax_Tree_Implementation_Idioms
    [8] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301. 3781, 2013.
    [9] Gray D, Bowes D, Davey N, Yi S, Christianson B. Software defect prediction using static code metrics underestimates defect-proneness. In:Proc. of the 2010 Int'l Joint Conf. on Neural Networks (IJCNN). IEEE, 2010. 1-7.[doi:10.1109/IJCNN.2010. 5596650]
    [10] Hosseini S, Turhan B, Gunarathna D. A systematic literature review and meta-analysis on cross project defect prediction. IEEE Trans. on Software Engineering, 2017, 45(2):111-147.[doi:10.1109/TSE.2017.2770124]
    [11] Jin C. Cross-project software defect prediction based on domain adaptation learning and optimization. Expert Systems with Applications, 2021, 171(1).[doi:10.1016/j.eswa.2021.114637]
    [12] Chen S, Ye JM, Liu T. A cross project software defect prediction method based on domain adaptation. Ruan Jian Xue Bao/Journal of Software, 2020, 31(2):266-281(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5632.htm[doi:10. 13328/j.cnki.jos.005632]
    [13] Wang K, Gou C, Duan Y, Lin Y, Zheng X, Wang F. Generative adversarial networks:Introduction and outlook. IEEE/CAA Journal of Automatica Sinica, 2017, 4(4):588-598.[doi:10.1109/JAS.2017.7510583]
    [14] Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V. Domain-adversarial training of neural networks. The Journal of Machine Learning Research, 2016, 17(1):2096-2030.[doi:10.1007/978-3-319-58347-1_10]
    [15] Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. In:Proc. of the IEEE Int'l Conf. on Computer Vision. IEEE, 2017. 2223-2232.[doi:10.1109/ICCV.2017.244]
    [16] Azadi S, Fisher M, Kim VG, Wang Z, Shechtman E, Darrell T. Multi-content gan for few-shot font style transfer. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. IEEE, 2018. 7564-7573.[doi:10.1109/CVPR.2018.00789]
    [17] Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J. Stargan:Unified generative adversarial networks for multi-domain image-to-image translation. In:Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition. IEEE, 2018. 8789-8797.[doi:10. 1109/CVPR.2018.00916]
    [18] Engel J, Agrawal KK, Chen S, Gulrajani I, Donahue C, Roberts A. Gansynth:Adversarial neural audio synthesis. arXiv preprint arXiv:1902.08710, 2019.
    [19] Chang CP. Integrating action-based defect prediction to provide recommendations for defect action correction. Int'l Journal of Software Engineering and Knowledge Engineering, 2013, 23(2):147-172.[doi:10.1142/S0218194013500022]
    [20] Singh P, Pal NR, Verma S, Vyas OP. Fuzzy rule-based approach for software fault prediction. IEEE Trans. on Systems, Man, and Cybernetics:Systems, 2016, 47(5):826-837.[doi:10.1109/TSMC.2016.2521840]
    [21] Laradji IH, Alshayeb M, Ghouti L. Software defect prediction using ensemble learning on selected features. Information and Software Technology, 2015, 58:388-402.[doi:10.1016/j.infsof.2014.07.005]
    [22] He Q, Shen B, Chen Y. Software defect prediction using semi-supervised learning with change burst information. In:Proc. of the 40th IEEE Annual Computer Software and Applications Conf.(COMPSAC), Vol.1. IEEE, 2016. 113-122.[doi:10.1109/COMPSAC.2016.193]
    [23] Yang X, Lo D, Xia X, Sun J. TLEL:A two-layer ensemble learning approach for just-in-time defect prediction. Information and Software Technology, 2017, 87:206-220.[doi:10.1016/j.infsof.2017.03.007]
    [24] Wu F, Jing XY, Dong X, Cao J, Xu B. Cross-project and within-project semi-supervised software defect prediction problems study using a unified solution. In:Proc. of the 39th IEEE/ACM Int'l Conf. on Software Engineering Companion (ICSE-C). IEEE, 2017. 195-197.[doi:10.1109/ICSE-C.2017.72]
    [25] Hinton GE, Osindero S, Teh YW. A fast learning algorithm for deep belief nets. Neural Computation, 2006, 18(7):1527-1554.[doi:10.1162/NECO.2006.18.7.1527]
    [26] Li J, He P, Zhu J, Lyu MR. Software defect prediction via convolutional neural network. In:Proc. of the 2017 IEEE Int'l Conf. on Software Quality, Reliability and Security (QRS). IEEE, 2017. 318-328.[doi:10.1109/QRS.2017.42]
    [27] Nam J, Pan SJ, Kim S. Transfer defect learning. In:Proc. of the 35th Int'l Conf. on Software Engineering (ICSE). IEEE, 2013. 382-391.[doi:10.1109/ICSE.2013.6606584]
    [28] Long M, Wang J, Ding G, Sun J, Yu PS. Transfer feature learning with joint distribution adaptation. In:Proc. of the IEEE Int'l Conf. on Computer Vision. 2013. 2200-2207.[doi:10.1109/ICCV.2013.274]
    [29] Turhan B, Menzies T, Bener AB, Stefano JD. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 2009, 14(5):540-578.[doi:10.1007/s10664-008-9103-7]
    [30] Xu Z, Pang S, Zhang T, Luo XP, Liu J, Tang YT, Yu X, Xue L. Cross project defect prediction via balanced distribution adaptation based transfer learning. Journal of Computer Science and Technology, 2019, 34(5):1039-1062.[doi:10.1007/s11390-019-1959-z]
    [31] Xia X, Lo D, Pan SJ, Nagappan N, Wang X. Hydra:Massively compositional model for cross-project defect prediction. IEEE Trans. on Software Engineering, 2016, 42(10):977-998.[doi:10.1109/TSE.2016.2543218]
    [32] Ryu D, Jang JI, Baik J. A transfer cost-sensitive boosting approach for cross-project defect prediction. Software Quality Journal, 2017, 25(1):235-272.[doi:10.1007/s11219-015-9287-1]
    [33] Wu F, Jing, XY, Ying S, Jing S, Sun Y. Cross-project and within-project semi-supervised software defect prediction:A unified approach. IEEE Trans. on Reliability, 2018, 67(2):581-597.[doi:10.1109/TR.2018.2804922]
    [34] Zhong S, Khoshgoftaar TM, Seliya N. Unsupervised learning for expert-based software quality estimation. In:Proc. of the HASE. 2004. 149-155.[doi:10.1109/HASE.2004.1281739]
    [35] Zhang F, Zheng Q, Zou Y, Hassan AE. Cross-project defect prediction using a connectivity-based unsupervised classifier. In:Proc. of the 38th IEEE/ACM Int'l Conf. on Software Engineering (ICSE). IEEE, 2016. 309-320.[doi:10.1145/2884781.2884839]
    [36] Javalang. 2020. https://github.com/c2nes/javalang
    [37] Li Y, Huang CL, Wang ZF, Yuan L, Wang XH. Overview of software vulnerability mining methods based on machine learning. Ruan Jian Xue Bao/Journal of Software, 2020, 31(7):2040-2061(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6055.htm[doi:10.13328/j.cnki.jos.006055]
    [38] Zhang X, Ben KR, Zeng J. Slice size defect prediction method based on code naturalness. Ruan Jian Xue Bao/Journal of Software, 2021, 32(7):2219-2241(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6261.htm[doi:10.13328/j.cnki.jos. 006261]
    [39] Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial networks. arXiv preprint arXiv:1406.2661, 2014.
    [40] Saon G, Tüske Z, Bolanos D, Kingsbury B. Advancing RNN transducer technology for speech recognition. In:Proc. of the 2021 IEEE Int'l Conf. on Acoustics, Speech and Signal Processing (ICASSP 2021). IEEE, 2021. 5654-5658.
    [41] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Computation, 1997, 9(8):1735-1780.[doi:10.1162/neco.1997.9.8. 1735]
    [42] Ray A, Rajeswar S, Chaudhury S. Text recognition using deep BLSTM networks. In:Proc. of the 8th Int'l Conf. on Advances in Pattern Recognition (ICAPR). IEEE, 2015. 1-6.[doi:10.1109/ICAPR.2015.7050699]
    [43] Cramer JS. The Origins of Logistic Regression. Social Science Electronic Publishing, 2003.[doi:10.2139/ssrn.360300]
    [44] PROMISE repository. 2017. https://github.com/opensciences/opensciences.github.io
    附中文参考文献:
    [12] 陈曙,叶俊民,刘童.一种基于领域适配的跨项目软件缺陷预测方法.软件学报, 2020, 31(2):266-281. http://www.jos.org. cn/1000-9825/5632.htm[doi:10.13328/j.cnki.jos.005632]
    [37] 李韵,黄辰林,王中锋,袁露,王晓川.基于机器学习的软件漏洞挖掘方法综述.软件学报, 2020, 31(7):2040-2061. http://www.jos.org.cn/1000-9825/6055.htm[doi:10.13328/j.cnki.jos.006055]
    [38] 张献,贲可荣,曾杰.基于代码自然性的切片粒度缺陷预测方法.软件学报, 2021, 32(7):2219-2241. http://www.jos.org.cn/1000-9825/6261.htm[doi:10.13328/j.cnki.jos.006261]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

邢颖,钱晓萌,管宇,章世豪,赵梦赐,林婉婷.一种采用对抗学习的跨项目缺陷预测方法.软件学报,2022,33(6):2097-2112

复制
分享
文章指标
  • 点击次数:2011
  • 下载次数: 4737
  • HTML阅读次数: 3000
  • 引用次数: 0
历史
  • 收稿日期:2021-09-05
  • 最后修改日期:2021-10-15
  • 在线发布日期: 2022-01-28
  • 出版日期: 2022-06-06
文章二维码
您是第19710161位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号