基于知识图谱的跨项目安全缺陷报告预测方法
作者:
中图分类号:

TP311

基金项目:

国家自然科学基金(62202414,62141208);国家重点研发计划(2020YFC0833105Z1)


Cross-project Prediction Method of Security Bug Reports Based on Knowledge Graph
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [40]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    安全缺陷报告可以描述软件产品中的安全关键漏洞.为了消除软件产品的安全攻击风险,安全缺陷报告(security bug report,SBR)预测越来越受到研究人员的关注.但在实际软件开发场景中,需要进行软件安全漏洞预测的项目可能是来自新公司或属于新启动的项目,没有足够的已标记安全缺陷报告供在实践中构建此软件安全漏洞预测模型.一种简单的解决方案就是使用迁移模型,即利用其他项目已经标记过的数据来构建预测模型.受到该领域最近的两项研究工作的启发,以安全关键字过滤为思路提出一种融合知识图谱的跨项目安全缺陷报告预测方法KG-SBRP (knowledge graph of security bug report prediction).使用安全缺陷报告中的文本信息域结合CWE (common weakness enumeration)与CVE Details (common vulnerabilities and exposures)共同构建三元组规则实体,以三元组规则实体构建安全漏洞知识图谱,在图谱中结合实体及其关系识别安全缺陷报告.将数据分为训练集和测试集进行模型拟合和性能评估.所构建的模型在7个不同规模的安全缺陷报告数据集上展开实证研究,研究结果表明,所提方法与当前主流方法FARSEC和Keyword matrix相比,在跨项目安全缺陷报告预测场景下,性能指标F1-score值可以平均提高11%,除此之外,在项目内安全缺陷报告预测场景下,F1-score值同样可以平均提高30%.

    Abstract:

    Security bug reports (SBRs) can describe critical security vulnerabilities in software products. SBR prediction has attracted the increasing attention of researchers to eliminate security attack risks of software products. However, in actual software development scenarios, a new company or new project may need software security bug prediction, without enough marked SBRs for building SBR prediction models in practice. A simple solution is employing the migration model, which means that marked data of other projects can be adopted to build the prediction model. Inspired by two recent studies in this field, this study puts forward a cross-project SBR prediction method integrating knowledge graphs, i.e., knowledge graph of security bug report prediction (KG-SBRP), based on the idea of security keyword filtering. The text information field in SBR is combined with common weakness enumeration (CWE) and common vulnerabilities and exposures (CVE) Details to build a triple rule entity. Then the entity is utilized to build a knowledge graph of security bugs and identify SBRs by combining the entity and relationship recognition. Finally, the data is divided into training sets and test sets for model fitting and performance evaluation. The built model conducts empirical research on seven SBR datasets with different scales. The results show that compared with the current main methods FARSEC and Keyword matrix, the proposed method can increase the performance index F1-score by an average of 11% under cross-project SBR prediction scenarios. In addition, the F1-score value can also grow by an average of 30% in SBR prediction scenarios within a project.

    参考文献
    [1] Amoroso E. Recent progress in software security. IEEE Software, 2018, 35(2):11-13.[doi:10.1109/MS.2018.1661316]
    [2] CVE Website. 2021. https://www.openstack.org/
    [3] CVE Detail. 2021. https://www.cvedetails.com/vendor/11727/Openstack.html
    [4] Ambari. 2019. http://ambari.apache.org/
    [5] Camel. 2019. http://camel.apache.org/
    [6] Derby. 2019. http://db.apache.org/derby/
    [7] Wicket. 2019. http://wicket.apache.org/
    [8] Apache. 2019. http://db.apache.org/
    [9] Peters F, Tun TT, Yu YJ, Nuseibeh B. Text filtering and ranking for security bug report prediction. IEEE Transactions on Software Engineering, 2019, 45(6):615-631.[doi:10.1109/TSE.2017.2787653]
    [10] Wu XX, Zheng W, Xia X, Lo D. Data quality matters:a case study on data label correctness for security bug report prediction. IEEE Transactions on Software Engineering, 2022, 48(7):2541-2556.[doi:10.1109/TSE.2021.3063727]
    [11] Behl D, Handa S, Arora A. A bug mining tool to identify and analyze security bugs using naive Bayes and TF-IDF. In:Proc. of the 2014 Int'l Conf. on Reliability Optimization and Information Technology. Faridabad:IEEE, 2014. 294-299.
    [12] Shu R, Xia TP, Williams L, Menzies T. Better security bug report classification via hyperparameter optimization. arXiv:1905.06872, 2019.
    [13] Goseva-Popstojanova K, Tyo J. Identification of security related bug reports via text mining using supervised and unsupervised classification. In:Proc. of the 2018 IEEE Int'l Conf. on Software Quality, Reliability and Security (QRS). Lisbon:IEEE, 2018. 344-355.
    [14] Jiang Y, Lu PC, Su XH, Wang TT. LTRWES:a new framework for security bug report detection. Information and Software Technology, 2020, 124:106314.[doi:10.1016/j.infsof.2020.106314]
    [15] Gegick M, Rotella P, Xie T. Identifying security bug reports via text mining:An industrial case study. In:Proc. of the 7th IEEE Working Conf. on Mining Software Repositories. Cape Town:IEEE, 2010. 11-20.
    [16] Wijayasekara D, Manic M, Wright JL, McQueen M. Mining bug databases for unidentified software vulnerabilities. In:Proc. of the 5th Int'l Conf. on Human System Interactions. Perth:IEEE, 2012. 89-96.
    [17] Camargo Cruz AE, Ochimizu K. Towards logistic regression models for predicting fault-prone code across software projects. In:Proc. of the 3rd Int'l Symp. on Empirical Software Engineering and Measurement. Lake Buena Vista:IEEE, 2009. 460-463.
    [18] Peters F, Menzies T, Marcus A. Better cross company defect prediction. In:Proc. of the 10th Working Conf. on Mining Software Repositories. San Francisco:IEEE, 2013. 409-418.
    [19] Zimmermann T, Nagappan N, Gall H, Giger E, Murphy B. Cross-project defect prediction:A large scale experiment on data vs. domain vs. process. In:Proc. of the 7th Joint Meeting of the European Software Engineering Conf. and the ACM SIGSOFT Symp. on the Foundations of Software Engineering. Amsterdam:ACM, 2009. 91-100.
    [20] Nam J, Pan SJ, Kim S. Transfer defect learning. In:Proc. of the 35th Int'l Conf. on Software Engineering (ICSE). San Francisco:IEEE, 2013. 382-391.
    [21] Turhan B, Menzies T, Bener AB, Di Stefano J. On the relative value of cross-company and within-company data for defect prediction. Empirical Software Engineering, 2009, 14(5):540-578.[doi:10.1007/s10664-008-9103-7]
    [22] Wu F, Jing XY, Sun Y, Sun J, Huang L, Cui FY, Sun YF. Cross-project and within-project semisupervised software defect prediction:a unified approach. IEEE Transactions on Reliability, 2018, 67(2):581-597.[doi:10.1109/TR.2018.2804922]
    [23] Zhong S, Khoshgoftaar TM, Seliya N. Unsupervised learning for expert-based software quality estimation. In:Proc. of the 8th IEEE Int'l Symp. on High Assurance Systems Engineering. Tampa:IEEE, 2004. 149-155.
    [24] Xia X, Lo D, Qiu WW, Wang XG, Zhou B. Automated configuration bug report prediction using text mining. In:Proc. of the 38th IEEE Annual Computer Software & Applications Conf. Vasteras:IEEE, 2014. 107-116.
    [25] 2021 Top25 CWEs. 2021. https://cwe.mitre.org/top25/
    [26] Martin B, Brown M, Paller A, Kirby D, Christey S. 2011 CWE/SANS top 25 most dangerous software errors. 2011. https://cwe.mitre.org/top25/archive/2010/2010_cwe_sans_top25.pdf
    [27] Chawla I, Singh SK. Automatic bug labeling using semantic information from LSI. In:Proc. of the 7th Int'l Conf. on Contemporary Computing (IC3). Noida:IEEE, 2014. 376-381.
    [28] Chaparro O, Lu J, Zampetti F, Moreno L, Di Penta M, Marcus A, Bavota G, Ng V. Detecting missing information in bug descriptions. In:Proc. of the 11th Joint Meeting on Foundations of Software Engineering. Paderborn:ACM, 2017. 396-407.
    [29] 路鹏程. 基于深度学习的安全缺陷报告识别和缺陷定位[硕士学位论文]. 哈尔滨:哈尔滨工业大学, 2019.
    Lu PC. Security bug report identification and bug localization based on deep learning[MS. Thesis]. Harbin:Harbin Institute of Technology, 2019 (in Chinese with English abstract).
    [30] Yang XL, Lo D, Huang Q, Xia X, Sun JL. Automated identification of high impact bug reports leveraging imbalanced learning strategies. In:Proc. of the 40th IEEE Annual Computer Software and Applications Conf. (COMPSAC). Atlanta:IEEE, 2016. 227-232.
    [31] Zhou YQ, Sharma A. Automated identification of security issues from commit messages and bug reports. In:Proc. of the 11th Joint Meeting on Foundations of Software Engineering. Paderborn:ACM, 2017. 914-919.
    [32] Yang B, Xing ZC, Xia X, Chen CY, Ye DH, Li SP. Don't do that! Hunting down visual design smells in complex UIS against design guidelines. In:Proc. of the 43rd IEEE/ACM Int'l Conf. on Software Engineering (ICSE). Madrid:IEEE, 2021. 761-772.
    [33] Fan YR, Xia X, Lo D, Hassan AE. Chaff from the wheat:Characterizing and determining valid bug reports. IEEE Transactions on Software Engineering, 2020, 46(5):495-525.[doi:10.1109/TSE.2018.2864217]
    [34] Pletea D, Vasilescu B, Serebrenik A. Security and emotion:Sentiment analysis of security discussions on GitHub. In:Proc. of the 11th Working Conf. on Mining Software Repositories. Hyderabad:ACM, 2014. 348-351.
    [35] Xia X, Lo D, Shihab E, Wang XY, Zhou B. Automatic, high accuracy prediction of reopened bugs. Automated Software Engineering, 2015, 22(1):75-109.[doi:10.1007/s10515-014-0162-2]
    [36] 郑炜, 陈军正, 吴潇雪, 陈翔, 夏鑫. 基于深度学习的安全缺陷报告预测方法实证研究. 软件学报, 2020, 31(5):1294-1313. http://www.jos.org.cn/1000-9825/5954.htm
    Zheng W, Chen JZ, Wu XX, Chen X, Xia X. Empirical studies on deep-learning-based security bug report prediction methods. Ruan Jian Xue Bao/Journal of Software, 2020, 31(5):1294-1313 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/5954.htm
    [37] Zheng W, Chen Z, Wu XX, Fu WQ, Sun BW, Cheng JY. A domain knowledge-guided lightweight approach for security bug reports prediction. In:Proc. of the 8th Int'l Conf. on Dependable Systems and Their Applications (DSA). Yinchuan:IEEE, 2021. 359-368.
    [38] Zheng W, Cheng JY, Wu XX, Sun RY, Wang XL, Sun XB. Domain knowledge-based security bug reports prediction. Knowledge-Based Systems, 2022, 241:108293.[doi:10.1016/j.knosys.2022.108293]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

郑炜,刘程远,吴潇雪,陈翔,成婧源,孙小兵,孙瑞阳.基于知识图谱的跨项目安全缺陷报告预测方法.软件学报,2024,35(3):1257-1279

复制
分享
文章指标
  • 点击次数:1112
  • 下载次数: 2484
  • HTML阅读次数: 1335
  • 引用次数: 0
历史
  • 收稿日期:2022-01-06
  • 最后修改日期:2022-06-26
  • 在线发布日期: 2023-07-05
  • 出版日期: 2024-03-06
文章二维码
您是第20236967位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号