Abstract:The distribution information of measurement data is important for understanding and using object-oriented software metrics. However, there is no research on the distribution of objectoriented software cohesion metrics except the CK metric LCOM. Unfortunately, existing studies show that LCOM is not a good metric for class cohesion of object-oriented software. Therefore, it is necessary to investigate the distribution of other cohesion metrics. Based on 112 Java open-source software systems, the distribution of measurement data of 17 cohesion metrics, including those lack of cohesion metrics (e.g. LCOM1, LCOM2, LCOM3, LCOM4, LCOM5), connectivity-based cohesionmetrics (e.g. TCC, LCC, DCI, DCD), and similarity-based cohesion metrics (e.g. CC, LSCC, SCOM, SCC), are investigated empirically in this paper. The results show that the data of non-normalized cohesion metric can be fitted into log-normal distribution and power law distribution, whereas the data of similarity-based cohesion metrics should be fitted into log-normal distribution only when excluding those special class with at most one method and without field data, and, for connectivitybased cohesion metrics, the data of their corresponding non-normalized cohesion metric follow log-normal distribution and power law distribution. This study provides important information for researchers to understand and use cohesion metrics, in particular, to identify thresholds for these metrics using the approaches in the literatures.