Software Traceability Recovery Framework Based on Active Learning and Semi-supervised Learning

doi:10.13328/j.cnki.jos.007178

微信服务号

微信订阅号

2025-4-21- 11

Home > Archive>Volume , Issue , >1-25. DOI:10.13328/j.cnki.jos.007178

PDF HTML XML Export Cite reminder

Software Traceability Recovery Framework Based on Active Learning and Semi-supervised Learning
DOI:
                        10.13328/j.cnki.jos.007178
                    
Author:
                        DONG Li-MingDONG Li-Ming
Software Institute, Nanjing University, Nanjing 210093, China;State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210093, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG HeZHANG He
Software Institute, Nanjing University, Nanjing 210093, China;State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210093, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
MENG Qing-LongMENG Qing-Long
Software Institute, Nanjing University, Nanjing 210093, China;State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210093, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
KUANG Hong-YuKUANG Hong-Yu
Software Institute, Nanjing University, Nanjing 210093, China;State Key Laboratory for Novel Software Technology (Nanjing University), Nanjing 210093, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP311
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Software traceability is considered critical to trustworthy software engineering, ensuring software reliability through the tracking of the software development process. Despite significant progress in automatic software traceability recovery techniques in recent years, their application in real-world commercial software projects does not meet expectations. An investigation into the application of learning-based software traceability recovery classifier models in commercial software projects is conducted. It uncovers three critical challenges faced in industrial settings. These challenges contribute to underperforming traceability models: low-quality raw data, data sparsity, and class imbalance. In response to these challenges, STRACE(AL+SSL) is proposed. It is a software traceability recovery framework that integrates active learning and semi-supervised learning. By strategically selecting valuable annotated samples and generating high-quality pseudo-labeled samples, STRACE(AL+SSL) effectively harnesses unlabeled data to address data-related challenges. Multiple comparative experiments are conducted with nearly one million issue-commit trace pair samples from 10 different enterprise projects. The results of these experiments validate the effectiveness of the proposed framework for real-world software traceability recovery tasks. The ablation results show that the unlabeled samples selected by the active learning in STRACE(AL+SSL) play a crucial role in the traceability recovery task. Additionally, the optimal combination of sample selection strategies in STRACE(AL+SSL) is confirmed. This includes CBST-Adjust for the semi-supervised sample rebalancing strategy and SMI_Flqmi, which is recognized for its cost-effectiveness and efficiency in active learning.

Key words:software traceability;active learning;semi-supervised learning

Get Citation

董黎明,张贺,孟庆龙,匡宏宇.结合主动学习和半监督学习的软件可追踪性恢复框架.软件学报,,():1-25

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:June 01,2023
Revised:August 13,2023
Adopted:
Online: September 04,2024
Published:

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History