Semi-Supervised Ensemble Learning Approach for Cross-Project Defect Prediction

doi:10.13328/j.cnki.jos.005228

微信服务号

微信订阅号

2025-4-8- 5

Home > Archive>Volume 28, Issue 6, 2017 >1455-1473. DOI:10.13328/j.cnki.jos.005228

PDF HTML XML Export Cite reminder

Semi-Supervised Ensemble Learning Approach for Cross-Project Defect Prediction
DOI:
                        10.13328/j.cnki.jos.005228
                    
Author:
                        HE Ji-YuanHE Ji-Yuan
Department of Software Engineering, School of Computer Software, Tianjin University, Tianjin 300072, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
MENG Zhao-PengMENG Zhao-Peng
Department of Software Engineering, School of Computer Software, Tianjin University, Tianjin 300072, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CHEN XiangCHEN Xiang
School of Computer Science and Technology, Nantong University, Nantong 226019, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WANG ZanWANG Zan
Department of Software Engineering, School of Computer Software, Tianjin University, Tianjin 300072, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
FAN Xiang-YuFAN Xiang-Yu
Department of Software Engineering, School of Computer Software, Tianjin University, Tianjin 300072, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:National Natural Science Foundation of China (61202030, 61373012, 61202006, 71502125)

Article

Figures

Metrics

Reference [56]

Related [20]

Cited by [2]

Materials

Comments

Abstract:

Software defect prediction can help developers to optimize the distribution of test resources by predicting whether or not a software module is defect-prone. Most defect prediction researches focus on within-project defect prediction which needs sufficient training data from the same project. However, in real software development, a project which needs defect prediction is always new or without any historical data. Therefore cross-project defect prediction becomes a hot topic which uses training data from several projects and performs prediction on another one. The main research challenges in cross-project defect prediction are the variety of distribution from source project to target project and class imbalance problem among datasets. Inspired by search based software engineering, this paper proposes a search based semi-supervised ensemble learning approach S³EL. By adjusting the ratio of distribution in training dataset,several Naïve Bayes classifiers are built as the base learners, then a small amount of labeled target instances and genetic algorithm are used to combine these base classifiers as a final prediction model. S³EL is compared with other up-to-date classical cross-project defect prediction approaches (such as Burak filter, Peters filter, TCA+, CODEP and HYDRA) on AEEEM and Promise dataset. Final results show that S³EL has the best prediction performance in most cases under the F1 measure.

Key words:cross-project defect prediction;semi-supervised learning,ensemble learning;genetic algorithm;Naïve Bayes

Reference

[1] Kim S, Whitehead EJ, Zhang Y. Classifying software changes:Clean or buggy? IEEE Trans. on Software Engineering, 2008,34(2):181-196.[doi:10.1109/TSE.2007.70773]

[2] Xia X, Lo D, Pan SJ, Nagappan N, Wang X. HYDRA:Massively compositional model for cross-project defect prediction. IEEE Trans. on Software Engineering, 2016,42(10):977-998.[doi:10.1109/TSE.2016.2543218]

[3] Kim S, Zhang H, Wu R, Gong L. Dealing with noise in defect prediction. In:Proc. of the Int'l Conf. on Software Engineering. 2011. 481-490.[doi:10.1145/1985793.1985859]

[4] Wang J, Shen B, Chen Y. Compressed C4.5 models for software defect prediction. In:Proc. of the Int'l Conf. on Quality Software. 2012. 13-16.[doi:10.1109/QSIC.2012.19]

[5] Sun Z, Song Q, Zhu X. Using coding-based ensemble learning to improve software defect prediction. IEEE Trans. on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2012,42(6):1806-1817.[doi:10.1109/TSMCC.2012.2226152]

[6] Zhou MH, Guo CG. New thinking of software engineering based on big data. Communications of the CCF, 2014,10(3):37-42(in Chinese).

[7] Canfora G, Lucia AD, Penta MD, Oliveto R, Panichella A, Panichella S. Multi-Objective cross-project defect prediction. In:Proc. of the Int'l Conf. on Software Testing, Verification and Validation. 2013. 252-261.[doi:10.1109/ICST.2013.38]

[8] Briand LC, Melo WL, Wust J. Assessing the applicability of fault-proneness models across object-oriented software projects. IEEE Trans. on Software Engineering, 2002,28(7):706-720.[doi:10.1109/TSE.2002.1019484]

[9] Cruz AEC, Ochimizu K. Towards logistic regression models for predicting fault-prone code across software projects. In:Proc. of the Int'l Symp. on Empirical Software Engineering and Measurement. 2009. 460-463.[doi:10.1109/ESEM.2009.5316002]

[10] Nam J, Pan SJ, Kim S. Transfer defect learning. In:Proc. of the Int'l Conf. on Software Engineering. 2013. 382-391.

[11] Pan SJ, Yang Q. A survey on transfer learning. IEEE Trans. on Knowledge and Data Engineering, 2010,22(10):1345-1359.[doi:10.1109/TKDE.2009.191]

[12] Zhuang FZ, Ping L, Qing HE, Shi ZZ. Survey on transfer learning research. Ruan Jian Xue Bao/Journal of Software, 2015,26(1):26-39(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4631.htm[doi:10.13328/j.cnki.jos.004631]

[13] Pelayo L, Dick S. Evaluating stratification alternatives to improve software defect prediction. IEEE Trans. on Reliability, 2012, 61(2):516-525.[doi:10.1109/TR.2012.2183912]

[14] Chen X, Gu Q, Liu WS, Liu SL, Ni C. Software defect prediction. Ruan Jian Xue Bao/Journal of Software, 2016,27(1):1-25(in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4923.htm[doi:10.13328/j.cnki.jos.004923]

[15] Harman M, Mansouri SA, Zhang YY. Search-Based software engineering:Trends, techniques and applications. ACM Computing Surveys, 2012,45(1):1-61.[doi:10.1145/2379776.2379787]

[16] Turhan B, Menzies T, Bener AB, Di Stefano J. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 2009,14(5):540-578.[doi:10.1007/s10664-008-9103-7]

[17] Peters F, Menzies T, Marcus A. Better cross company defect prediction. In:Proc. of the IEEE Working Conf. on Mining Software Repositories. 2013. 409-418.[doi:10.1109/MSR.2013.6624057]

[18] Panichella A, Oliveto R, Lucia AD. Cross-Project defect prediction models:L'Union fait la force. In:Proc. of the IEEE Conf. on Software Maintenance, Reengineering and Reverse Engineering. 2014. 164-173.[doi:10.1109/CSMR-WCRE.2014.6747166]

[19] He Z, Shu F, Yang Y, Li M, Wang Q. An investigation on the feasibility of cross-project defect prediction. Automated Software Engineering, 2011,19(2):167-199.[doi:10.1007/s10515-011-0090-3] 1472

[20] Malhotra R, Raje R. An empirical comparison of machine learning techniques for software defect prediction. In:Proc. of the Int'l Conf. on Bioinspired Information and Communications Technologies. 2014. 320-327.[doi:10.4108/icst.bict.2014.257871]

[21] Lessmann S, Baesens B, Mues C, Pietsch S. Benchmarking classification models for software defect prediction:A proposed framework and novel findings. IEEE Trans. on Software Engineering, 2008,34(4):485-496.[doi:10.1109/TSE.2008.35]

[22] Ghotra B, McIntosh S, Hassan AE. Revisiting the impact of classification techniques on the performance of defect prediction models. In:Proc. of the Int'l Conf. on Software Engineering. 2015. 789-800.[doi:10.1109/ICSE.2015.91]

[23] Zhang Y, Lo D, Xia X, Sun J. An empirical study of classifier combination for cross-project defect prediction. In:Proc. of the IEEE Computer Software and Applications Conf. 2015. 264-269.[doi:10.1109/COMPSAC.2015.58]

[24] Ryu D, Choi O, Baik J. Value-Cognitive boosting with a support vector machine for cross-project defect prediction. Empirical Software Engineering, 2014,21(1):43-71.[doi:10.1007/s10664-014-9346-4]

[25] Ryu D, Jang J, Baik J. A hybrid instance selection using nearest-neighbor for cross-project defect prediction. Journal of Computer Science and Technology, 2015,30(5):969-980.[doi:10.1007/s11390-015-1575-5]

[26] Turhan B, Misirli AT, Bener A. Empirical evaluation of the effects of mixed project data on learning defect predictors. Information and Software Technology, 2013,55(6):1101-1118.[doi:10.1016/j.infsof.2012.10.003]

[27] Zhong S, Khoshgoftaar TM, Seliya N. Unsupervised learning for expert-based software quality estimation. In:Proc. of the IEEE Int'l Symp. on High Assurance Systems Engineering. 2004. 149-155.[doi:10.1109/HASE.2004.1281739]

[28] Zhang F, Zheng Q, Zou Y, Hassan AE. Cross-Project defect prediction using a connectivity-based unsupervised classifier. In:Proc. of the Int'l Conf. on Software Engineering. 2016. 309-320.[doi:10.1145/2884781.2884839]

[29] Nam J, Kim S. CLAMI:Defect prediction on unlabeled datasets. In:Proc. of the Int'l Conf. on Automated Software Engineering. 2015. 452-463.[doi:10.1109/ASE.2015.56]

[30] Concas G, Marchesi M, Pinna S, Serra N. Power-Laws in a large object-oriented software system. IEEE Trans. on Software Engineering, 2007,33(10):687-708.[doi:10.1109/TSE.2007.1019]

[31] Jiang Y, Cukic B, Menzies T. Can data transformation help in the detection of fault-prone modules? In:Proc. of the Workshop on Defects in Large Software Systems. 2008. 16-20.[doi:10.1145/1390817.1390822]

[32] Menzies T, Greenwald J, Frank A. Data mining static code attributes to learn defect predictors. IEEE Trans. on Software Engineering, 2007,33(1):2-13.[doi:10.1109/TSE.2007.256941]

[33] Song Q, Jia Z, Shepperd M, Ying S, Liu J. A general software defect-proneness prediction framework. IEEE Trans. on Software Engineering, 2011,37(3):356-370.[doi:10.1109/TSE.2010.90]

[34] Zhang F, Mockus A, Keivanloo I, Zou Y. Towards building a universal defect prediction model. In:Proc. of the Working Conf. on Mining Software Repositories. 2014. 182-191.[doi:10.1145/2597073.2597078]

[35] Rahman F, Devanbu P. How, and why, process metrics are better. In:Proc. of the Int'l Conf. on Software Engineering. 2013. 432-441.[doi:10.1109/ICSE.2013.6606589]

[36] Bacchelli A, D'Ambros M, Lanza M. Are popular classes more defect prone? Lecture Notes in Computer Science, 2010,6013:59-73.[doi:10.1007/978-3-642-12029-9_5]

[37] Nagappan N, Ball T. Use of relative code churn measures to predict system defect density. In:Proc. of the Int'l Conf. on Software Engineering. 2005. 284-292.[doi:10.1109/ICSE.2005.1553571]

[38] Hassan AE. Predicting faults using the complexity of code changes. In:Proc. of the Int'l Conf. on Software Engineering. 2009. 78-88.[doi:10.1109/ICSE.2009.5070510]

[39] Moser R, Pedrycz W, Succi G. A comparative analysis of the efficiency of change metrics and static code attributes for defect prediction. In:Proc. of the Int'l Conf. on Software Engineering. 2008. 181-190.[doi:10.1145/1368088.1368114]

[40] Herzig K, Just S, Rau A, Zeller A. Predicting defects using change genealogies. In:Proc. of the Int'l Symp. on Software Reliability Engineering. 2013. 118-127.[doi:10.1109/ISSRE.2013.6698911]

[41] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software:An update. ACM SIGKDD Explorations Newsletter, 2009,11(1):10-18.[doi:10.1145/1656274.1656278]

[42] Ambros MD, Lanza M, Robbes R. An extensive comparison of bug prediction approaches. In:Proc. of the IEEE Working Conf. on Mining Software Repositories. 2010. 31-41.[doi:10.1109/MSR.2010.5463279]

[43] Agarwal S. Data mining:Data mining concepts and techniques. In:Proc. of the Int'l Conf. on Machine Intelligence and Research Advancement. 2013. 203-207.[doi:10.1109/ICMIRA.2013.45]

[44] Nguyen AT, Nguyen TT, Nguyen HA, Nguyen TN. Multi-Layered approach for recovering links between bug reports and fixes. In:Proc. of the ACM SIGSOFT Int'l Symp. on the Foundations of Software Engineering. 2012. 1-11.[doi:10.1145/2393596. 2393671]

[45] Tian Y, Lawall J, Lo D. Identifying Linux bug fixing patches. In:Proc. of the Int'l Conf. on Software Engineering. 2012. 386-396.[doi:10.1109/ICSE.2012.6227176]

[46] Wu R, Zhang H, Kim S, Cheung SC. ReLink:Recovering links between bugs and changes. In:Proc. of the ACM SIGSOFT Symp. and the European Conf. on Foundations of Software Engineering. 2011. 15-25.[doi:10.1145/2025113.2025120]

[47] Friedman M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 1937,32(200):675-701.[doi:10.1080/01621459.1937.10503522]

[48] Wilcoxon F. Individual comparisons by ranking methods. Biometrics, 1945,1(6):80-83.[doi:10.2307/3001968]

[49] Cao Q, Sun Q, Cao Q, Tan H. Software defect prediction via transfer learning based neural network. In:Proc. of the Int'l Conf. on Reliability Systems Engineering. 2015. 1-10.[doi:10.1109/ICRSE.2015.7366475]

[50] Xu Z, Xuan J, Liu J, Cui X. MICHAC:Defect prediction via feature selection based on maximal information coefficient with hierarchical agglomerative clustering. In:Proc. of the IEEE Int'l Conf. on Software Analysis, Evolution, and Reengineering. 2016. 370-381.[doi:10.1109/SANER.2016.34]

[51] Liu SL, Chen X, Liu WS, Chen JQ, Gu Q, Chen DX. FECAR:A feature selection framework for software defect prediction. In:Proc. of the Annual Int'l Computers, Software and Applications Conf. 2014. 426-435.[doi:10.1109/COMPSAC.2014.66]

[52] Liu WS, Liu SL, Gu Q, Chen JQ, Chen X, Chen DX. Empirical studies of a two-stage data preprocessing approach for software fault prediction. IEEE Trans. on Reliability, 2016,65(1):38-53.[doi:10.1109/TR.2015.2461676]

附中文参考文献:

[6] 周明辉,郭长国.基于大数据的软件工程新思维.计算机学会通讯,2014,10(3):37-42.

[12] 庄福振,罗平,何清,等.迁移学习研究进展.软件学报,2015,26(1):26-39. http://www.jos.org.cn/1000-9825/4631.htm[doi:10. 13328/j.cnki.jos.004631]

[14] 陈翔,顾庆,刘望舒,刘树龙,倪超.静态软件缺陷预测方法研究.软件学报,2016,27(1):1-25. http://www.jos.org.cn/1000-9825/4923.htm[doi:10.13328/j.cnki.jos.004923]

Get Citation

何吉元,孟昭鹏,陈翔,王赞,樊向宇.一种半监督集成跨项目软件缺陷预测方法.软件学报,2017,28(6):1455-1473

Copy

Article Metrics

Abstract:9603
PDF: 12431
HTML: 3321
Cited by: 0

History

Received:July 28,2016
Revised:October 11,2016
Adopted:
Online: February 21,2017
Published:

You are the first2033814Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History