In recent years, a large number of software defect prediction models have been proposed. Once a new defect prediction model is proposed, it is often compared with previous defect prediction models to evaluate its effectiveness. However, there is no consensus on how to compare the newly proposed defect prediction model with previous defect prediction models. Different studies often adopt different settings for comparison, which may lead to misleading conclusions in the comparisons of prediction models, and consequently lead to missing the opportunity to improve the effectiveness of defect prediction. This paper systematically reviews the comparative experiments of software defect prediction models conducted by domestic and foreign scholars in recent years. First, we introduce the comparisons of defect prediction models. Then, the research progress is summarized from the perspectives of defect dataset, dataset split, baseline models, performance indicators, and classification thresholds, respectively, in the comparisons. Finally, we summarize the opportunities and challenges in comparative experiments of defect prediction models and outline the research directions in the future.