Abstract:Code commit is one of the most important software evolution data, and it is widely used in the software review and code comprehension. A commit involving multiple modified classes and code makes the review of code changes difficult. By analyzing a large amount of commit data, this study discovers that identifying the core modified classes in a commit can speed up commit review for developers. Inspired by the effectiveness of machine learning techniques in classification, the paper models the core class identification as a binary classification problem (i.e., core and non-core) and proposes discriminative features from a large number of commits to characterize the core modified classes. The experiments results show that the proposed approach achieves 87% accuracy, and using core class in commit review provides significant improvement than the ones without core class.