Empirical Study of Code Smell Detection on Active Learning
Author:
Affiliation:

Clc Number:

Fund Project:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
  • |
  • Comments
    Abstract:

    The detection of code smells using machine learning and deep learning approaches relies heavily on extensive annotated datasets. However, such annotated datasets are scarce in the field of code smells, and there is a prevalence of unannotated data. Consequently, active learning methods can be applied to the detection of code smells. Previous research has demonstrated that in the field of software engineering, active learning can yield models with superior performance while requiring less annotation and training costs. Nonetheless, the specific impact of active learning on the performance of code smell detection models remains unclear. Applying active learning strategies that are effective in other domains to code smell detection tasks without adaptation may lead to adverse effects. This paper aims to evaluate the impact of active learning on the performance of code smell detection models. To this end, an extensive analysis was conducted on the code smell dataset MLCQ, involving 11 implementations of 5 query strategies, 8 classifiers, and 10 different query ratios to explore their specific impacts on model performance. The results indicate: (1) Among the 11 query strategies involved in this study, those based on uncertainty and committee-based strategies performed better than others, with margin querying (based on uncertainty) and vote entropy querying (based on committee) being particularly notable. (2) Among the 8 classifiers explored, the Random Forest classifier exhibited the best overall performance. (3) Regarding the active learning query ratios, model performance improved significantly as the query ratio increased from 0% to 25%. However, as the query ratio continued to increase from 25% to 50%, the enhancement in model performance slowed and could potentially decline.

    Reference
    Related
    Cited by
Get Citation

陈浩轩,刘磊,黄若煊,张一卓,胡文华,马传香.基于主动学习的代码异味检测实证研究.软件学报,2025,36(7):0

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:August 24,2024
  • Revised:October 15,2024
  • Adopted:
  • Online: December 10,2024
  • Published:
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063