一种基于同步语义对齐的异构缺陷预测方法

doi:10.13328/j.cnki.jos.006495

微信服务号

微信订阅号

2025年8月13日 16:56 星期三

首页 > 过刊浏览>2023年第34卷第6期 >2669-2689. DOI:10.13328/j.cnki.jos.006495

PDF HTML阅读 XML下载导出引用引用提醒

一种基于同步语义对齐的异构缺陷预测方法
DOI:
                        10.13328/j.cnki.jos.006495
                    
CSTR:
                        
                    
作者:
                        李伟湋李伟湋
南京航空航天大学 计算机科学与技术学院, 江苏 南京 211106;南京航空航天大学 航天学院, 江苏 南京 211106
在期刊界中查找
在百度中查找
在本站中查找
陈翔陈翔
南通大学 信息科学技术学院, 江苏 南通 226019
在期刊界中查找
在百度中查找
在本站中查找
张恒伟张恒伟
南京理工大学 计算机科学与工程学院, 江苏 南京 210094
在期刊界中查找
在百度中查找
在本站中查找
黄志球黄志球
南京航空航天大学 计算机科学与技术学院, 江苏 南京 211106
在期刊界中查找
在百度中查找
在本站中查找
贾修一贾修一
南京理工大学 计算机科学与工程学院, 江苏 南京 210094
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:李伟湋(1981-),女,博士,副研究员,CCF专业会员,主要研究领域为机器学习,软件可靠性;陈翔(1980-),男,博士,副教授,CCF高级会员,主要研究领域为软件缺陷预测,软件缺陷定位,回归测试,组合测试;张恒伟(1994-),男,硕士生,主要研究领域为机器学习,软件缺陷预测;黄志球(1965-),男,博士,教授,CCF杰出会员,主要研究领域为软件工程,软件安全性,形式化方法;贾修一(1983-),男,博士,副教授,CCF高级会员,主要研究领域为机器学习,粒计算,数据挖掘
通讯作者:贾修一，jiaxy@njust.edu.cn
中图分类号:TP311
基金项目:国家重点研发计划（2018YFB1003900）；国家自然科学基金（61906090，62176123）；中央高校基本科研业务费专项资金（30920021131）

Heterogeneous Defect Prediction Based on Simultaneous Semantic Alignment

Author:

LI Wei-Wei
LI Wei-Wei
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;College of Astronautics, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Xiang
CHEN Xiang
School of Information Science and Technology, Nantong University, Nantong 226019, China
在期刊界中查找
在百度中查找
在本站中查找
ZHANG Heng-Wei
ZHANG Heng-Wei
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
在期刊界中查找
在百度中查找
在本站中查找
HUANG Zhi-Qiu
HUANG Zhi-Qiu
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
在期刊界中查找
在百度中查找
在本站中查找
JIA Xiu-Yi
JIA Xiu-Yi
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [37]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

异构缺陷预测（heterogeneous defect prediction，HDP）在具有异构特征的项目间进行缺陷预测，可以有效解决源项目和目标项目使用了不同特征的问题.当前大多数HDP方法都是通过学习域不变特征子空间以减少域之间的差异来解决异构特征问题.但是，源域和目标域通常呈现出巨大的异质性，使得域对齐效果并不好.究其原因，这些方法都忽视了分类器对于两个域中的同一类别应产生相似的分类概率分布这一潜在知识，没有挖掘数据中包含的内在语义信息.另一方面，由于在新启动项目或历史遗留项目中搜集训练数据依赖于专家知识，费时费力且容易出错，探究了基于目标项目内少数标记模块来进行异构缺陷预测的可能性.鉴于此，提出一种基于同步语义对齐的异构缺陷预测方法（SHSSAN）.一方面，探索从标记的源项目中学到的隐性知识，从而在类别之间传递相关性，达到隐式语义信息迁移.另一方面，为了学习未标记目标数据的语义表示，通过目标伪标签进行质心匹配达到显式语义对齐.同时，SHSSAN可以有效解决异构缺陷数据集中常见的类不平衡和数据线性不可分问题，并充分利用目标项目中的标签信息.对包含30个不同项目的公共异构数据集进行的实验表明，与目前表现优异的CTKCCA、CLSUP、MSMDA、KSETE和CDAA方法相比，在F-measure和AUC上分别提升了6.96%、19.68%、19.43%、13.55%、9.32%和2.02%、3.62%、2.96%、3.48%、2.47%.

关键词:异构缺陷预测;语义对齐;少样本数据;类不平衡;线性不可分

Abstract:

Heterogeneous defect prediction (HDP) can effectively solve the problem that the source project and the target project use different features. It uses heterogeneous feature data from the source project to predict the defect tendency of the software module in the target project. At present, HDP has made certain achievements, but its overall performance is not satisfactory. Most previous HDP methods solve this problem by learning domain invariant feature subspace to reduce the difference between domains. However, the source domain and the target domain usually show huge heterogeneity, which makes the domain alignment effect not satisfied. The reason is that these methods ignore the potential knowledge that the classifier should generate similar classification probability distributions for the same category in the two domains, and fail to mine the intrinsic semantic information contained in the data. In addition, because the collection of training data in newly launched projects or historical legacy projects relies on expert knowledge, is time-consuming, laborious, and error-prone, the possibility of heterogeneous defect prediction is explored based on a small number of labeled modules in the target project. Based on these, a heterogeneous defect prediction method is proposed based on simultaneous semantic alignment (SHSSAN). On the one hand, it explores the implicit knowledge learned from the labeled source projects, so as to transfer relevance between categories and achieve implicit semantic information transfer. On the other hand, in order to learn the semantic representation of unlabeled target data, centroid matching is performed through target pseudo-labels to achieve explicit semantic alignment. At the same time, SHSSAN can effectively solve the class imbalance problem and the data linearly inseparable problem, and make full use of the label information in the target project. Experiments on public heterogeneous data sets containing 30 different projects show that compared with the current excellent CTKCCA, CLSUP, MSMDA, KSETE, and CDAA methods, the F-measure and AUC are increased by 6.96%, 19.68%, 19.43%, 13.55%, 9.32% and 2.02%, 3.62%, 2.96%, 3.48%, 2.47%, respectively.

Key words:heterogeneous defect prediction (HDP);semantic alignment;few sample data;class imbalance;linearly inseparable

参考文献

[1] Wang S, Liu TY, Tan L. Automatically learning semantic features for defect prediction. In: Proc. of the 38th IEEE/ACM Int’l Conf. on Software Engineering. Austin: IEEE, 2016. 297–308.

[2] Li ZQ, Jing XY, Wu F, Zhu XK, Xu BW, Ying S. Cost-sensitive transfer kernel canonical correlation analysis for heterogeneous defect prediction. Automated Software Engineering, 2018, 25(2): 201–245. [doi: 10.1007/s10515-017-0220-7]

[3] Chen X, Gu Q, Liu WS, Liu SL, Ni C. Survey of static software defect prediction. Ruan Jian Xue Bao/Journal of Software, 2016, 27(1): 1–25 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4923.htm 陈翔, 顾庆, 刘望舒, 刘树龙, 倪超. 静态软件缺陷预测方法研究. 软件学报, 2016, 27(1): 1-25. http://www.jos.org.cn/1000-9825/4923.htm

[4] Ma Y, Luo GC, Zeng X, Chen AG. Transfer learning for cross-company software defect prediction. Information and Software Technology, 2012, 54(3): 248–256. [doi: 10.1016/j.infsof.2011.09.007]

[5] Nam J, Pan SJ, Kim S. Transfer defect learning. In: Proc. of the 35th Int’l Conf. on Software Engineering. San Francisco: IEEE, 2013. 382–391.

[6] Xu Z, Yuan PP, Zhang T, Tang YT, Li S, Xia Z. HDA: Cross-project defect prediction via heterogeneous domain adaptation with dictionary learning. IEEE Access, 2018, 6: 57597–57613. [doi: 10.1109/ACCESS.2018.2873755]

[7] Chen HW, Jing XY, Li ZQ, Wu D, Peng Y, Huang ZG. An empirical study on heterogeneous defect prediction approaches. IEEE Transactions on Software Engineering, 2021, 47(12): 2803–2822. [doi: 10.1109/TSE.2020.2968520]

[8] Chen X, Wang LP, Gu Q, Wang Z, Ni C, Liu WS, Wang QP. A survey on cross-project software defect prediction methods. Chinese Journal of Computers, 2018, 41(1): 254–274 (in Chinese with English abstract). [doi: 10.11897/SP.J.1016.2018.00254] 陈翔, 王莉萍, 顾庆, 王赞, 倪超, 刘望舒, 王秋萍. 跨项目软件缺陷预测方法研究综述. 计算机学报, 2018, 41(1): 254-274. [doi: 10.11897/SP.J.1016.2018.00254]

[9] Turhan B, Mısırlı AT, Bener A. Empirical evaluation of the effects of mixed project data on learning defect predictors. Information and Software Technology, 2013, 55(6): 1101–1118. [doi: 10.1016/j.infsof.2012.10.003]

[10] Xia X, Lo D, Pan SJ, Nagappan N, Wang XY. Hydra: Massively compositional model for cross-project defect prediction. IEEE Transactions on Software Engineering, 2016, 42(10): 977–998. [doi: 10.1109/TSE.2016.2543218]

[11] Nam J, Fu W, Kim S, Menzies T, Tan L. Heterogeneous defect prediction. IEEE Transactions on Software Engineering, 2018, 44(9): 874–896. [doi: 10.1109/TSE.2017.2720603]

[12] Li S, Xie BH, Wu JS, Zhao Y, Liu CH, Ding ZM. Simultaneous semantic alignment network for heterogeneous domain adaptation. In: Proc. of the 28th ACM Int’l Conf. on Multimedia. Lisboa: ACM, 2020. 3866–3874.

[13] He P, Li B, Ma YT. Towards cross-project defect prediction with imbalanced feature sets. arXiv: 1411.4228, 2014.

[14] Yu Q, Jiang SJ, Zhang YM. A feature matching and transfer approach for cross-company defect prediction. Journal of Systems and Software, 2017, 132: 366–378. [doi: 10.1016/j.jss.2017.06.070]

[15] Jing XY, Wu F, Dong XW, Qi FM, Xu BW. Heterogeneous cross-company defect prediction by unified metric representation and CCA-based transfer learning. In: Proc. of the 10th Joint Meeting on Foundations of Software Engineering. Singapore: Association for Computing Machinery, 2015. 496–507.

[16] Cheng M, Wu GQ, Jiang M, Wan HY, You GA, Yuan MT. Heterogeneous defect prediction via exploiting correlation subspace. In: Proc. of the 28th Int’l Conf. on Software Engineering and Knowledge Engineering. Redwood: KSI Research Inc. and Knowledge Systems Institute Graduate School, 2016. 171–176.

[17] Li ZQ, Jing XY, Zhu XK, Zhang HY. Heterogeneous defect prediction through multiple kernel learning and ensemble learning. In: Proc. of the 2017 IEEE Int’l Conf. on Software Maintenance and Evolution. Shanghai: IEEE, 2017. 91–102.

[18] Li ZQ, Jing XY, Zhu XK. Heterogeneous fault prediction with cost-sensitive domain adaptation. Software Testing, Verification and Reliability, 2018, 28(2): e1658. [doi: 10.1002/stvr.1658]

[19] Li ZQ, Jing XY, Zhu XK, Zhang HY, Xu BW, Ying S. Heterogeneous defect prediction with two-stage ensemble learning. Automated Software Engineering, 2019, 26(3): 599–651. [doi: 10.1007/s10515-019-00259-1]

[20] Xu Z, Ye SZ, Zhang T, Xia Z, Pang S, Wang Y, Tang YT. MVSE: Effort-aware heterogeneous defect prediction via multiple-view spectral embedding. In: Proc. of the 19th IEEE Int’l Conf. on Software Quality, Reliability and Security. Sofia: IEEE, 2019. 10–17.

[21] Tong HN, Liu B, Wang SH. Kernel spectral embedding transfer ensemble for heterogeneous defect prediction. IEEE Transactions on Software Engineering, 2021, 47(9): 1886–1906. [doi: 10.1109/TSE.2019.2939303]

[22] Li ZQ, Jing XY, Zhu XK, Zhang HY, Xu BW, Ying S. On the multiple sources and privacy preservation issues for heterogeneous defect prediction. IEEE Transactions on Software Engineering, 2019, 45(4): 391–411. [doi: 10.1109/TSE.2017.2780222]

[23] Gong LN, Jiang SJ, Jiang L. Conditional domain adversarial adaptation for heterogeneous defect prediction. IEEE Access, 2020, 8: 150738–150749. [doi: 10.1109/ACCESS.2020.3017101]

[24] Wang AL, Zhang YT, Wu HB, Jiang KY, Wang MH. Few-shot learning based balanced distribution adaptation for heterogeneous defect prediction. IEEE Access, 2020, 8: 32989–33001. [doi: 10.1109/ACCESS.2020.2973924]

[25] Wang C, Mahadevan S. Heterogeneous domain adaptation using manifold alignment. In: Proc. of the 22nd Int’l Joint Conf. on Artificial Intelligence. Barcelona: AAAI Press, 2011. 1541–1546.

[26] Li W, Duan LX, Xu D, Tsang IW. Learning with augmented features for supervised and semi-supervised heterogeneous domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(6): 1134–1148. [doi: 10.1109/TPAMI.2013.167]

[27] Chen WY, Hsu TMH, Tsai YHH, Wang YCF, Chen MS. Transfer neural trees for heterogeneous domain adaptation. In: Proc. of the 14th European Conf. on Computer Vision. Amsterdam: Springer, 2016. 399–414.

[28] Tsai YHH, Yeh YR, Wang YCF. Learning cross-domain landmarks for heterogeneous domain adaptation. In: Proc. of the 29th IEEE Conf. on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016. 5081–5090.

[29] Yao Y, Zhang Y, Li XT, Ye YM. Heterogeneous domain adaptation via soft transfer network. In: Proc. of the 27th ACM Int’l Conf. on Multimedia. Nice: Association for Computing Machinery, 2019. 1578–1586.

[30] Japkowicz N, Stephen S. The class imbalance problem: A systematic study. Intelligent Data Analysis, 2002, 6(5): 429–449. [doi: 10.3233/IDA-2002-6504]

[31] Weiss K, Khoshgoftaar TM, Wang DD. A survey of transfer learning. Journal of Big Data, 2016, 3(1): 9. [doi: 10.1186/s40537-016-0043-6]

[32] Hsieh YT, Tao SY, Tsai YHH, Yeh YR, Wang YCF. Recognizing heterogeneous cross-domain data via generalized joint distribution adaptation. In: Proc. of the 9th IEEE Int’l Conf. on Multimedia and Expo. Seattle: IEEE, 2016. 1–6.

[33] Pan YW, Yao T, Li YH, Wang Y, Ngo CW, Mei T. Transferrable prototypical networks for unsupervised domain adaptation. In: Proc. of the 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019. 2234–2242.

[34] Tzeng E, Hoffman J, Zhang N, Saenko K, Darrell T. Deep domain confusion: Maximizing for domain invariance. arXiv:1412.3474, 2014.

[35] Li ZQ, Jing XY, Zhu XK. Progress on approaches to software defect prediction. IET Software, 2018, 12(3): 161–175. [doi: 10.1049/iet-sen.2017.0148]

[36] Cliff N. Ordinal Methods for Behavioral Data Analysis. New York: Psychology Press, 2014.

[37] Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K. The impact of automated parameter optimization on defect prediction models. IEEE Transactions on Software Engineering, 2019, 45(7): 683–711. [doi: 10.1109/TSE.2018.2794977]

引用本文

李伟湋,陈翔,张恒伟,黄志球,贾修一.一种基于同步语义对齐的异构缺陷预测方法.软件学报,2023,34(6):2669-2689

复制

文章指标

点击次数:1153
下载次数: 3045
HTML阅读次数: 1605
引用次数: 0

历史

收稿日期:2021-04-12
最后修改日期:2021-07-18
录用日期:
在线发布日期: 2022-10-28
出版日期: 2023-06-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码