Abstract:In recent years, a large number of software defect prediction models have been proposed. Once a new defect prediction model is proposed, it is often compared with previous defect prediction models to evaluate its effectiveness. However, there is no consensus on how to compare the newly proposed defect prediction model with previous defect prediction models. Different studies often adopt different settings for comparison, which may lead to misleading conclusions in the comparisons of prediction models, and consequently lead to missing the opportunity to improve the effectiveness of defect prediction. This study systematically reviews the comparative experiments of software defect prediction models conducted by worldwide scholars in recent years. First, the comparisons of defect prediction models are introduced. Then, the research progress is summarized from the perspectives of defect dataset, dataset split, baseline models, performance indicators, and classification thresholds, respectively, in the comparisons. Finally, the opportunities and challenges are summarized in comparative experiments of defect prediction models and the research directions in the future are outlined.