Heterogeneous Defect Prediction Based on Simultaneous Semantic Alignment

doi:10.13328/j.cnki.jos.006495

微信服务号

微信订阅号

2025-6-30- 0

Home > Archive>Volume 34, Issue 6, 2023 >2669-2689. DOI:10.13328/j.cnki.jos.006495

PDF HTML XML Export Cite reminder

Heterogeneous Defect Prediction Based on Simultaneous Semantic Alignment
DOI:
                        10.13328/j.cnki.jos.006495
                    
Author:
                        LI Wei-WeiLI Wei-Wei
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CHEN XiangCHEN Xiang
School of Information Science and Technology, Nantong University, Nantong 226019, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
ZHANG Heng-WeiZHANG Heng-Wei
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
HUANG Zhi-QiuHUANG Zhi-Qiu
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
JIA Xiu-YiJIA Xiu-Yi
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP311
Fund Project:

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

Heterogeneous defect prediction (HDP) can effectively solve the problem that the source project and the target project use different features. It uses heterogeneous feature data from the source project to predict the defect tendency of the software module in the target project. At present, HDP has made certain achievements, but its overall performance is not satisfactory. Most previous HDP methods solve this problem by learning domain invariant feature subspace to reduce the difference between domains. However, the source domain and the target domain usually show huge heterogeneity, which makes the domain alignment effect not satisfied. The reason is that these methods ignore the potential knowledge that the classifier should generate similar classification probability distributions for the same category in the two domains, and fail to mine the intrinsic semantic information contained in the data. In addition, because the collection of training data in newly launched projects or historical legacy projects relies on expert knowledge, is time-consuming, laborious, and error-prone, the possibility of heterogeneous defect prediction is explored based on a small number of labeled modules in the target project. Based on these, a heterogeneous defect prediction method is proposed based on simultaneous semantic alignment (SHSSAN). On the one hand, it explores the implicit knowledge learned from the labeled source projects, so as to transfer relevance between categories and achieve implicit semantic information transfer. On the other hand, in order to learn the semantic representation of unlabeled target data, centroid matching is performed through target pseudo-labels to achieve explicit semantic alignment. At the same time, SHSSAN can effectively solve the class imbalance problem and the data linearly inseparable problem, and make full use of the label information in the target project. Experiments on public heterogeneous data sets containing 30 different projects show that compared with the current excellent CTKCCA, CLSUP, MSMDA, KSETE, and CDAA methods, the F-measure and AUC are increased by 6.96%, 19.68%, 19.43%, 13.55%, 9.32% and 2.02%, 3.62%, 2.96%, 3.48%, 2.47%, respectively.

Key words:heterogeneous defect prediction (HDP);semantic alignment;few sample data;class imbalance;linearly inseparable

Get Citation

李伟湋,陈翔,张恒伟,黄志球,贾修一.一种基于同步语义对齐的异构缺陷预测方法.软件学报,2023,34(6):2669-2689

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:April 12,2021
Revised:July 18,2021
Adopted:
Online: October 28,2022
Published: June 06,2023

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History