Abstract:Key classes are a crucial starting point for understanding complex software, contributing to the optimization of documentation and the compression of reverse-engineered class diagrams. Although many effective key class identification methods have been proposed, three major limitations remain: 1) software networks, which are graphs representing software elements and their dependencies, often include elements that are never or rarely executed at runtime; 2) networks constructed through dynamic analysis are frequently incomplete, potentially omitting truly key classes; and 3) most existing approaches consider only the effect of direct coupling between classes, while ignoring the influence of indirect (non-contact) coupling and the diversity of degree distribution among neighboring nodes. To address these issues, a key class identification approach is proposed that integrates dynamic analysis with a gravitational formula. First, a class coupling network (CCN) is constructed using static analysis to represent classes and their coupling relationships. Second, a gravitational entropy (GEN) metric is introduced to quantify class importance by jointly considering direct and indirect couplings in the CCN and the degree-distribution diversity of neighboring nodes. Third, classes are ranked in descending order based on their GEN values to obtain a preliminary ranking. Finally, dynamic analysis is performed to capture actual runtime interactions between classes, which are used to refine the preliminary results. A threshold is applied to filter out non-key classes, producing a final set of candidate key classes. Experimental results on eight open-source Java projects demonstrate that the proposed method significantly outperforms eleven baseline approaches when considering no more than the top 15% (or top 25) of nodes. The integration of dynamic analysis notably improves the performance of the proposed method. Moreover, the choice of weighting schemes for coupling types has a minimal impact on performance, and the overall computational efficiency is acceptable.