Abstract:By mining software repositories, software defect prediction can construct models to predict potential defective modules of projects under testing in advance and then optimize the allocation of test resources. When considering effort-aware performance measures, the performance comparison between supervised methods and unsupervised methods has been a recent hot topic. In the recent study for file-level defect prediction problem, Yan et al. conducted empirical studies by using unsupervised and supervised methods considered by Yang et al. and obtained the conclusion that some unsupervised methods can outperform the supervised methods. The empirical studies based on 10 projects from the open source community were conducted. Final results show that under the within-project defect prediction scenario, MULTI method can improve 105.81% and 123.84% respectively on average when compared to the best unsupervised method and the best supervised method based on ACC performance measure. While MULTI method can improve 35.61% and 38.70% respectively on average when compared to the best unsupervised method and the best supervised method based on POPT performance measure. Under the cross- project defect prediction scenario, MULTI method can improve 22.42% and 34.95% respectively on average when compared to the best unsupervised method and the best supervised method based on ACC performance measure. While MULTI method can improve 11.45% and 17.92% respectively on average when compared to the best unsupervised method and the best supervised method based on POPT performance measure. Based on PMI and IFA performance measures proposed by Huang et al., it is found that MULTI method has the issue of trade-off, but it is still better than the best two unsupervised methods when considering ACC and POPT performance measures. Besides, MULTI method is compared with the recently proposed OneWay and CBS methods. The results show that MULTI performs significantly better than these two methods. Based on F1 performance measure, MULTI method also shows the superiority. Finally, the analysis on the time cost of the model construction shows that the overhead of MULTI method is acceptable.