基于知识图谱的跨项目安全缺陷报告预测方法

doi:10.13328/j.cnki.jos.006812

微信服务号

微信订阅号

2025年7月15日 7:39 星期二

首页 > 过刊浏览>2024年第35卷第3期 >1257-1279. DOI:10.13328/j.cnki.jos.006812

PDF HTML阅读 XML下载导出引用引用提醒

基于知识图谱的跨项目安全缺陷报告预测方法
DOI:
                        10.13328/j.cnki.jos.006812
                    
CSTR:
                        
                    
作者:
                        郑炜郑炜
西北工业大学 软件学院, 陕西 西安 710072;空天地海一体化大数据应用技术国家工程实验室(西北工业大学), 陕西 西安 710072;大数据存储与管理工业和信息化部重点实验室(西北工业大学), 陕西 西安 710072
在期刊界中查找
在百度中查找
在本站中查找
刘程远刘程远
西北工业大学 软件学院, 陕西 西安 710072
在期刊界中查找
在百度中查找
在本站中查找
吴潇雪吴潇雪
扬州大学 信息工程学院, 江苏 扬州 225127
在期刊界中查找
在百度中查找
在本站中查找
陈翔陈翔
南通大学 信息科学技术学院, 江苏 南通 226019;信息安全国家重点实验室(中国科学院 信息工程研究所), 北京 100093
在期刊界中查找
在百度中查找
在本站中查找
成婧源成婧源
西北工业大学 软件学院, 陕西 西安 710072
在期刊界中查找
在百度中查找
在本站中查找
孙小兵孙小兵
扬州大学 信息工程学院, 江苏 扬州 225127
在期刊界中查找
在百度中查找
在本站中查找
孙瑞阳孙瑞阳
西北工业大学 软件学院, 陕西 西安 710072
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:TP311
基金项目:国家自然科学基金(62202414,62141208);国家重点研发计划(2020YFC0833105Z1)

Cross-project Prediction Method of Security Bug Reports Based on Knowledge Graph

Author:

ZHENG Wei
ZHENG Wei
School of Software, Northwestern Polytechnical University, Xian 710072, China;National Engineering Laboratory for Integrated Aero-space-ground-ocean Big Data Application Technology(Northwestern Polytechnical University), Xian 710072, China;Key Laboratory of Big Data Storage and Management(Northwestern Polytechnical University), Ministry of Industry and Information Technology, Xian 710172, China
在期刊界中查找
在百度中查找
在本站中查找
LIU Cheng-Yuan
LIU Cheng-Yuan
School of Software, Northwestern Polytechnical University, Xian 710072, China
在期刊界中查找
在百度中查找
在本站中查找
WU Xiao-Xue
WU Xiao-Xue
College of Information Engineering, Yangzhou University, Yangzhou 225127, China
在期刊界中查找
在百度中查找
在本站中查找
CHEN Xiang
CHEN Xiang
School of Information Science and Technology, Nantong University, Nantong 226019, China;State Key Laboratory of Information Security(Institute of Information Engineering, Chinese Academy of Sciences), Beijing 100093, China
在期刊界中查找
在百度中查找
在本站中查找
CHENG Jing-Yuan
CHENG Jing-Yuan
School of Software, Northwestern Polytechnical University, Xian 710072, China
在期刊界中查找
在百度中查找
在本站中查找
SUN Xiao-Bing
SUN Xiao-Bing
College of Information Engineering, Yangzhou University, Yangzhou 225127, China
在期刊界中查找
在百度中查找
在本站中查找
SUN Rui-Yang
SUN Rui-Yang
School of Software, Northwestern Polytechnical University, Xian 710072, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [40]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

安全缺陷报告可以描述软件产品中的安全关键漏洞.为了消除软件产品的安全攻击风险,安全缺陷报告(security bug report,SBR)预测越来越受到研究人员的关注.但在实际软件开发场景中,需要进行软件安全漏洞预测的项目可能是来自新公司或属于新启动的项目,没有足够的已标记安全缺陷报告供在实践中构建此软件安全漏洞预测模型.一种简单的解决方案就是使用迁移模型,即利用其他项目已经标记过的数据来构建预测模型.受到该领域最近的两项研究工作的启发,以安全关键字过滤为思路提出一种融合知识图谱的跨项目安全缺陷报告预测方法KG-SBRP (knowledge graph of security bug report prediction).使用安全缺陷报告中的文本信息域结合CWE (common weakness enumeration)与CVE Details (common vulnerabilities and exposures)共同构建三元组规则实体,以三元组规则实体构建安全漏洞知识图谱,在图谱中结合实体及其关系识别安全缺陷报告.将数据分为训练集和测试集进行模型拟合和性能评估.所构建的模型在7个不同规模的安全缺陷报告数据集上展开实证研究,研究结果表明,所提方法与当前主流方法FARSEC和Keyword matrix相比,在跨项目安全缺陷报告预测场景下,性能指标F1-score值可以平均提高11%,除此之外,在项目内安全缺陷报告预测场景下,F1-score值同样可以平均提高30%.

关键词:软件安全;安全缺陷报告预测;跨项目;知识图谱;领域知识

Abstract:

Security bug reports (SBRs) can describe critical security vulnerabilities in software products. SBR prediction has attracted the increasing attention of researchers to eliminate security attack risks of software products. However, in actual software development scenarios, a new company or new project may need software security bug prediction, without enough marked SBRs for building SBR prediction models in practice. A simple solution is employing the migration model, which means that marked data of other projects can be adopted to build the prediction model. Inspired by two recent studies in this field, this study puts forward a cross-project SBR prediction method integrating knowledge graphs, i.e., knowledge graph of security bug report prediction (KG-SBRP), based on the idea of security keyword filtering. The text information field in SBR is combined with common weakness enumeration (CWE) and common vulnerabilities and exposures (CVE) Details to build a triple rule entity. Then the entity is utilized to build a knowledge graph of security bugs and identify SBRs by combining the entity and relationship recognition. Finally, the data is divided into training sets and test sets for model fitting and performance evaluation. The built model conducts empirical research on seven SBR datasets with different scales. The results show that compared with the current main methods FARSEC and Keyword matrix, the proposed method can increase the performance index F1-score by an average of 11% under cross-project SBR prediction scenarios. In addition, the F1-score value can also grow by an average of 30% in SBR prediction scenarios within a project.

Key words:software security;prediction of security bug report;cross-project;knowledge graph;domain knowledge

参考文献

[1] Amoroso E. Recent progress in software security. IEEE Software, 2018, 35(2):11-13.[doi:10.1109/MS.2018.1661316]

[2] CVE Website. 2021. https://www.openstack.org/

[3] CVE Detail. 2021. https://www.cvedetails.com/vendor/11727/Openstack.html

[4] Ambari. 2019. http://ambari.apache.org/

[5] Camel. 2019. http://camel.apache.org/

[6] Derby. 2019. http://db.apache.org/derby/

[7] Wicket. 2019. http://wicket.apache.org/

[8] Apache. 2019. http://db.apache.org/

[9] Peters F, Tun TT, Yu YJ, Nuseibeh B. Text filtering and ranking for security bug report prediction. IEEE Transactions on Software Engineering, 2019, 45(6):615-631.[doi:10.1109/TSE.2017.2787653]

[10] Wu XX, Zheng W, Xia X, Lo D. Data quality matters:a case study on data label correctness for security bug report prediction. IEEE Transactions on Software Engineering, 2022, 48(7):2541-2556.[doi:10.1109/TSE.2021.3063727]

[11] Behl D, Handa S, Arora A. A bug mining tool to identify and analyze security bugs using naive Bayes and TF-IDF. In:Proc. of the 2014 Int'l Conf. on Reliability Optimization and Information Technology. Faridabad:IEEE, 2014. 294-299.

[12] Shu R, Xia TP, Williams L, Menzies T. Better security bug report classification via hyperparameter optimization. arXiv:1905.06872, 2019.

[13] Goseva-Popstojanova K, Tyo J. Identification of security related bug reports via text mining using supervised and unsupervised classification. In:Proc. of the 2018 IEEE Int'l Conf. on Software Quality, Reliability and Security (QRS). Lisbon:IEEE, 2018. 344-355.

[14] Jiang Y, Lu PC, Su XH, Wang TT. LTRWES:a new framework for security bug report detection. Information and Software Technology, 2020, 124:106314.[doi:10.1016/j.infsof.2020.106314]

[15] Gegick M, Rotella P, Xie T. Identifying security bug reports via text mining:An industrial case study. In:Proc. of the 7th IEEE Working Conf. on Mining Software Repositories. Cape Town:IEEE, 2010. 11-20.

[16] Wijayasekara D, Manic M, Wright JL, McQueen M. Mining bug databases for unidentified software vulnerabilities. In:Proc. of the 5th Int'l Conf. on Human System Interactions. Perth:IEEE, 2012. 89-96.

[17] Camargo Cruz AE, Ochimizu K. Towards logistic regression models for predicting fault-prone code across software projects. In:Proc. of the 3rd Int'l Symp. on Empirical Software Engineering and Measurement. Lake Buena Vista:IEEE, 2009. 460-463.

[18] Peters F, Menzies T, Marcus A. Better cross company defect prediction. In:Proc. of the 10th Working Conf. on Mining Software Repositories. San Francisco:IEEE, 2013. 409-418.

[19] Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B. Cross-project defect prediction:A large scale experiment on data vs. domain vs. process. In:Proc. of the 7th Joint Meeting of the European Software Engineering Conf. and the ACM SIGSOFT Symp. on the Foundations of Software Engineering. Amsterdam:ACM, 2009. 91-100.

[20] Nam J, Pan SJ, Kim S. Transfer defect learning. In:Proc. of the 35th Int'l Conf. on Software Engineering (ICSE). San Francisco:IEEE, 2013. 382-391.

[21] Turhan B, Menzies T, Bener AB, Di Stefano J. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 2009, 14(5):540-578.[doi:10.1007/s10664-008-9103-7]

[22] Wu F, Jing XY, Sun Y, Sun J, Huang L, Cui FY, Sun YF. Cross-project and within-project semisupervised software defect prediction:a unified approach. IEEE Transactions on Reliability, 2018, 67(2):581-597.[doi:10.1109/TR.2018.2804922]

[23] Zhong S, Khoshgoftaar TM, Seliya N. Unsupervised learning for expert-based software quality estimation. In:Proc. of the 8th IEEE Int'l Symp. on High Assurance Systems Engineering. Tampa:IEEE, 2004. 149-155.

[24] Xia X, Lo D, Qiu WW, Wang XG, Zhou B. Automated configuration bug report prediction using text mining. In:Proc. of the 38th IEEE Annual Computer Software & Applications Conf. Vasteras:IEEE, 2014. 107-116.

[25] 2021 Top25 CWEs. 2021. https://cwe.mitre.org/top25/

[26] Martin B, Brown M, Paller A, Kirby D, Christey S. 2011 CWE/SANS top 25 most dangerous software errors. 2011. https://cwe.mitre.org/top25/archive/2010/2010_cwe_sans_top25.pdf

[27] Chawla I, Singh SK. Automatic bug labeling using semantic information from LSI. In:Proc. of the 7th Int'l Conf. on Contemporary Computing (IC3). Noida:IEEE, 2014. 376-381.

[28] Chaparro O, Lu J, Zampetti F, Moreno L, Di Penta M, Marcus A, Bavota G, Ng V. Detecting missing information in bug descriptions. In:Proc. of the 11th Joint Meeting on Foundations of Software Engineering. Paderborn:ACM, 2017. 396-407.

[29] 路鹏程. 基于深度学习的安全缺陷报告识别和缺陷定位[硕士学位论文]. 哈尔滨:哈尔滨工业大学, 2019.

Lu PC. Security bug report identification and bug localization based on deep learning[MS. Thesis]. Harbin:Harbin Institute of Technology, 2019 (in Chinese with English abstract).

[30] Yang XL, Lo D, Huang Q, Xia X, Sun JL. Automated identification of high impact bug reports leveraging imbalanced learning strategies. In:Proc. of the 40th IEEE Annual Computer Software and Applications Conf. (COMPSAC). Atlanta:IEEE, 2016. 227-232.

[31] Zhou YQ, Sharma A. Automated identification of security issues from commit messages and bug reports. In:Proc. of the 11th Joint Meeting on Foundations of Software Engineering. Paderborn:ACM, 2017. 914-919.

[32] Yang B, Xing ZC, Xia X, Chen CY, Ye DH, Li SP. Don't do that! Hunting down visual design smells in complex UIS against design guidelines. In:Proc. of the 43rd IEEE/ACM Int'l Conf. on Software Engineering (ICSE). Madrid:IEEE, 2021. 761-772.

[33] Fan YR, Xia X, Lo D, Hassan AE. Chaff from the wheat:Characterizing and determining valid bug reports. IEEE Transactions on Software Engineering, 2020, 46(5):495-525.[doi:10.1109/TSE.2018.2864217]

[34] Pletea D, Vasilescu B, Serebrenik A. Security and emotion:Sentiment analysis of security discussions on GitHub. In:Proc. of the 11th Working Conf. on Mining Software Repositories. Hyderabad:ACM, 2014. 348-351.

[35] Xia X, Lo D, Shihab E, Wang XY, Zhou B. Automatic, high accuracy prediction of reopened bugs. Automated Software Engineering, 2015, 22(1):75-109.[doi:10.1007/s10515-014-0162-2]

[36] 郑炜, 陈军正, 吴潇雪, 陈翔, 夏鑫. 基于深度学习的安全缺陷报告预测方法实证研究. 软件学报, 2020, 31(5):1294-1313. http://www.jos.org.cn/1000-9825/5954.htm

Zheng W, Chen JZ, Wu XX, Chen X, Xia X. Empirical studies on deep-learning-based security bug report prediction methods. Ruan Jian Xue Bao/Journal of Software, 2020, 31(5):1294-1313 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5954.htm

[37] Zheng W, Chen Z, Wu XX, Fu WQ, Sun BW, Cheng JY. A domain knowledge-guided lightweight approach for security bug reports prediction. In:Proc. of the 8th Int'l Conf. on Dependable Systems and Their Applications (DSA). Yinchuan:IEEE, 2021. 359-368.

[38] Zheng W, Cheng JY, Wu XX, Sun RY, Wang XL, Sun XB. Domain knowledge-based security bug reports prediction. Knowledge-Based Systems, 2022, 241:108293.[doi:10.1016/j.knosys.2022.108293]

引用本文

郑炜,刘程远,吴潇雪,陈翔,成婧源,孙小兵,孙瑞阳.基于知识图谱的跨项目安全缺陷报告预测方法.软件学报,2024,35(3):1257-1279

复制

文章指标

点击次数:1112
下载次数: 2484
HTML阅读次数: 1335
引用次数: 0

历史

收稿日期:2022-01-06
最后修改日期:2022-06-26
录用日期:
在线发布日期: 2023-07-05
出版日期: 2024-03-06

微信服务号

微信订阅号

引用本文

相关视频

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

相关视频

分享

微信扫一扫：分享

文章指标

历史

文章二维码