





Feature Representation Method for Heterogeneous Defect Prediction Based on Variational Autoencoders
Fund Project:

National Natural Science Foundation of China (61906090, U20B2064, 61773208); Natural Science Foundation of Jiangsu Province, China (BK20191287, BK20170809); Fundamental Research Funds for the Central Universities (30920021131); China Postdoctoral Science Foundation (2018M632304)

  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [31]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论



    Cross-project defect prediction technology can use the existing labeled defect data to predict new unlabeled data, but it needs to have the same metric features for two projects, which is difficult to be applied in actual development. Heterogeneous defect prediction can perform prediction without requiring the source and target project to have the same set of metrics and thus has attracted great interest. Existing heterogeneous defect prediction models use naive or traditional machine learning methods to learn feature representations between source and target projects, and perform prediction based on it. The feature representation learned by previous studies is weak, causing poor performance in predicting defect-prone instances. In view of the powerful feature extraction and representation capabilities of deep neural networks, this study proposes a feature representation method for heterogeneous defect prediction based on variational autoencoders. By combining the variational autoencoder and maximum mean discrepancy, this method can effectively learn the common feature representation of the source and target projects. Then, an effective defect prediction model can be trained based on it. The validity of the proposed method is verified by comparing it with traditional cross-project defect prediction methods and heterogeneous defect prediction methods on various datasets.

    [1] Hall T, Beecham S, Bowes D, Gray D, Counsell S. A systematic literature review on fault prediction performance in software engineering. IEEE Trans. on Software Engineering, 2011,38(6):1276-1304.
    [2] Chen X, Gu Q, Liu WS, Liu WS, Liu SL, Ni C. Survey of static software defect prediction. Ruan Jian Xue Bao/Journal of Software, 2016,27(1):1-25(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4923.htm[doi:10.13328/j.cnki.jos.004923]
    [3] D'Ambros M, Lanza M, Robbes R. Evaluating defect prediction approaches:A benchmark and an extensive comparison. Empirical Software Engineering, 2012,17(4-5):531-577.
    [4] Lee T, Nam J, Han D, Kim S. Developer micro interaction metrics for software defect prediction. IEEE Trans. on Software Engineering, 2016,42(11):1015-1035.
    [5] Menzies T, Greenwald J, Frank A. Data mining static code attributes to learn defect predictors. IEEE Trans. on Software Engineering, 2006,33(1):2-13.
    [6] Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B. Cross-project defect prediction:A large scale experiment on data vs. domain vs. process. In:Proc. of the 7th Joint Meeting of the European Software Engineering Conf. and the ACM SIGSOFT Symp. on the Foundations of Software Engineering. New York:Association for Computing Machinery, 2009. 91-100.
    [7] He Z, Shu F, Yang Y, Li MS, Wang Q. An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering, 2012,19(2):167-199.
    [8] Nam J, Pan SJ, Kim S. Transfer defect learning. In:Proc. of the 35th Int'l Conf. on Software Engineering. New York:Association for Computing Machinery, 2013. 382-391.
    [9] Ma Y, Luo GC, Zeng X, Chen AG. Transfer learning for cross-company software defect prediction. Information and Software Technology, 2012,54(3):248-256.
    [10] Nam J, Fu W, Kim S, Menzies T, Tan L. Heterogeneous defect prediction. IEEE Trans. on Software Engineering, 2017,44(9):874-896.
    [11] Kingma DP, Welling M. Auto-encoding variational bayes. In:Proc. of the 2nd Int'l Conf. on Learning Representations. 2014.
    [12] Jing XY, Wu F, Dong XW, Qi FM, Xu BW. Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In:Proc. of the 10th Joint Meeting on Foundations of Software Engineering. New York:Association for Computing Machinery, 2015. 496-507.
    [13] He P, Li B, Ma Y. Towards cross-project defect prediction with imbalanced feature sets. arXiv Preprint arXiv:1411.4228, 2014.
    [14] Cheng M, Wu GQ, Jiang M, Wan HY, You G, Yuan MT. Heterogeneous defect prediction via exploiting correlation subspace. In:Proc. of the 28th Int'l Conf. on Software Engineering and Knowledge Engineering. 2016. 171-176.
    [15] Zhang F, Zheng Q, Zou Y, Hassan AE. Cross-project defect prediction using a connectivity-based unsupervised classifier. In:Proc. of the 38th Int'l Conf. on Software Engineering. New York:Association for Computing Machinery, 2016. 309-320.
    [16] Li ZQ, Jing XY, Wu F, Zhu XK, Xu BW, Ying S. Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction. Automated Software Engineering, 2018,25(2):201-245.
    [17] Li ZQ, Jing XY, Zhu XK, Zhang HY. Heterogeneous defect prediction through multiple kernel learning and ensemble learning. In:Proc. of the IEEE Int'l Conf. on Software Maintenance and Evolution. 2017. 91-102.
    [18] Tong H, Liu B, Wang S. Kernel spectral embedding transfer ensemble for heterogeneous defect prediction. IEEE Trans. on Software Engineering, 2019.[doi:10.1109/TSE.2019.2939303]
    [19] Gong LN, Jiang SJ, Yu Q, Jiang L. Unsupervised deep domain adaptation for heterogeneous defect prediction. IEICE Trans. on Information and Systems, 2019,102(3):537-549.
    [20] Chen HW, Jing XY, Li ZQ, Wu D, Peng Y, Huang ZG. An empirical study on heterogeneous defect prediction approaches. IEEE Trans. on Software Engineering, 2020.[doi:10.1109/TSE.2020.2968520]
    [21] Kass RE, Carlin BP, Gelman A, Neal RM. Markov chain Monte Carlo in practice:A roundtable discussion. The American Statistician, 1998,52(2):93-100.
    [22] Blei DM, Kucukelbir A, McAuliffe JD. Variational inference:A review for statisticians. Journal of the American Statistical Association, 2017,112(518):859-877.
    [23] Tschannen M, Bachem O, Lucic M. Recent advances in autoencoder-based representation learning. arXiv Preprint arXiv:1812.05069, 2018.
    [24] Quadrianto N, Petterson J, Smola AJ. Distribution matching for transduction. In:Advances in Neural Information Processing Systems. 2009. 1500-1508.
    [25] Pan SJ, Tsang IW, Kwok JT, Yang Q. Domain adaptation via transfer component analysis. IEEE Trans. on Neural Networks, 2011, 22(2):199-210.
    [26] Wu R, Zhang H, Kim S, Cheung SC. Relink:Recovering links between bugs and changes. In:Proc. of the 19th ACM SIGSOFT Symp. and the 13th European Conf. on Foundations of Software Engineering. New York:Association for Computing Machinery, 2011. 15-25.
    [27] Ryu D, Choi O, Baik J. Value-cognitive boosting with a support vector machine for cross-project defect prediction. Empirical Software Engineering, 2016,21(1):43-71.
    [28] Zhou YM, Yang YB, Lu HM, Chen L, Li YH, Zhao YY, Qian JY, Xu BW. How far we have progressed in the journey? An examination of cross-project defect prediction. ACM Trans. on Software Engineering and Methodology, 2018,27(1):1-51.
    [29] Romano J, Kromrey JD, Coraggio J, Skowronek J, Devine L. Exploring methods for evaluating group differences on the NSSE and other surveys:Are the t-test and Cohen'sd indices the most appropriate choices. In:Proc. of the Annual Meeting of the Southern Association for Institutional Research. Citeseer, 2006. 1-51.
    [2] 陈翔,顾庆,刘望舒,刘树龙,倪超.静态软件缺陷预测方法研究.软件学报,2016,27(1):1-25. http://www.jos.org.cn/1000-9825/4923.htm[doi:10.13328/j.cnki.jos.004923]
    发 布


  • 点击次数:2795
  • 下载次数: 7032
  • HTML阅读次数: 3450
  • 引用次数: 0
  • 收稿日期:2020-04-13
  • 最后修改日期:2020-10-26
  • 在线发布日期: 2021-01-22
  • 出版日期: 2021-07-06
版权所有:中国科学院软件研究所 京ICP备05046678号-3
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn

京公网安备 11040202500063号