Impact of Mislabeled Changes by SZZ on Performance and Interpretation of Just-in-time Defect Prediction for Mobile APP
Author:
Affiliation:

Clc Number:

TP311

  • Article
  • | |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • | |
  • Comments
    Abstract:

    In recent years, as an algorithm for identifying bug-introducing changes, SZZ has been widely employed in just-in-time software defect prediction. Previous studies show that the SZZ algorithm may mislabel data during data annotation, which could influence the dataset quality and consequently the performance of the defect prediction model. Therefore, researchers have made improvements to the SZZ algorithm and proposed multiple variants of SZZ. However, there is no empirical study to explore the effect of data annotation quality by SZZ on the performance and interpretability of just-in-time defect prediction for mobile APP. To investigate the influence of mislabeled changes by SZZ on just-in-time defect prediction for mobile APP, this study conducts an extensive and in-depth empirical comparison of four SZZ algorithms. Firstly, 17 large-scale mobile APP projects are selected from the GitHub repository, and software metrics are extracted by adopting the PyDriller tool. Then, B-SZZ (original SZZ), AG-SZZ, MA-SZZ, and RA-SZZ are employed for data annotation. Then, the just-in-time defect prediction models are built with random forest, naive Bayes, and logistic regression classifiers based on the time-series data partitioning. Finally, the performance of the models is evaluated by traditional measures of AUC, MCC, and G-mean, and effort-aware measures of F-measure@20% and IFA, and a statistical significance test and interpretability analysis are conducted on the results by employing SKESD and SHAP respectively. By comparing the annotation performance of the four SZZ algorithms, the results are as follows. (1) The data annotation quality conforms to the progressive relationship among SZZ variants. (2) The mislabeled changes by B-SZZ, AG-SZZ, and MA-SZZ can cause performance reduction of AUC and MCC of different levels, but cannot lead to performance reduction of G-mean. (3) B-SZZ is likely to cause a performance reduction of F-measure@20%, while B-SZZ, AG-SZZ, and MA-SZZ are unlikely to increase effort during code inspection. (4) In terms of model interpretation, different SZZ algorithms will influence the three metrics with the largest contribution during the prediction, and the la metric has a significant influence on the prediction results.

    Reference
    Related
    Cited by
Get Citation

李志强,马睿,张洪宇,荆晓远,任杰,刘金会. SZZ误标变更对移动APP即时缺陷预测性能和解释的影响.软件学报,,():1-32

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 28,2023
  • Revised:December 26,2023
  • Online: February 19,2025
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063