Abstract:Code review is one of the best practices widely used in modern software development, which is crucial for ensuring software quality and strengthening engineering capability. Code review comments (CRCs) are one of the main and most important outputs of code reviews. CRCs are not only the reviewers’ perceptions of code quality but also the references for authors to fix code defects and improve quality. Nowadays, although a number of software organizations have developed guidelines for performing code reviews, there are still few effective methods for evaluating the quality of CRCs. To provide an explainable and automated quality evaluation of CRCs, this study conducts a series of empirical studies such as literature reviews and case analyses. Based on the results of the empirical studies, the study proposes a multi-label learning-based approach for evaluating the quality of CRCs. Experiments are carried out by using a large software enterprise-specific dataset that includes a total of 17 000 CRCs from 34 commercial projects. The results indicate that the proposed approach can effectively evaluate the quality attributes and grades of CRCs. The study also provides some modeling experiences such as CRC labeling and verification, so as to help software organizations struggling with code reviews better implement the proposed approach.