开源软件漏洞感知技术综述
作者:
作者简介:

詹奇(2001-),男,博士生,CCF学生会员,主要研究领域为智能化软件工程;潘圣益(1999-),男,博士生,CCF学生会员,主要研究领域为软件安全,智能化软件工程;胡星(1993-),女,博士,副教授,CCF专业会员,主要研究领域为智能化软件工程,开源软件供应链安全;鲍凌峰(1988-),男,博士,副教授,CCF专业会员,主要研究领域为软件工程,区块链;夏鑫(1986-),男,博士,CCF专业会员,主要研究领域为智能化软件工程,软件仓库挖掘,经验软件工程

通讯作者:

胡星,E-mail:xinghu@zju.edu.cn

基金项目:

国家重点研发计划(2021YFB2701102); 国家自然科学基金(62141222, U20A20173); 中央高校基本科研专项资金(226-2022-00064)


Survey on Vulnerability Awareness of Open Source Software
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [70]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    随着现代软件规模不断扩大, 软件漏洞给计算机系统和软件的安全运行、可靠性造成了极大的威胁, 进而给人们的生产生活造成巨大的损失. 近年来, 随着开源软件的广泛使用, 其安全问题受到广泛关注. 漏洞感知技术可以有效地帮助开源软件用户在漏洞纰漏之前提前感知到漏洞的存在, 从而进行有效防御. 与传统软件的漏洞检测不同, 开源漏洞的透明性和协同性给开源软件的漏洞感知带来巨大的挑战. 因此, 有许多学者和从业人员提出多种技术, 从代码和开源社区中感知开源软件中潜在的漏洞和风险, 以尽早发现开源软件中的漏洞从而降低漏洞所带来的损失. 为了促进开源软件漏洞感知技术的发展, 对已有研究成果进行系统的梳理、总结和点评. 选取45篇开源漏洞感知技术的高水平论文, 将其分为3大类: 基于代码的漏洞感知技术、基于开源社区讨论的漏洞感知技术和基于软件补丁的漏洞感知技术, 并对其进行系统地梳理、归纳和总结. 值得注意的是, 根据近几年最新研究的总结, 首次提出基于开源软件漏洞生命周期的感知技术分类, 对已有的漏洞感知技术分类进行补充和完善. 最后, 探索该领域的挑战, 并对未来研究的方向进行展望.

    Abstract:

    As the modern software scale expands, software vulnerabilities bring a great threat to the security and reliability of computer systems and software, causing huge damage to people’s production and life. In recent years, as open source software (OSS) is widely used, the vulnerability issues of OSS have received much attention. Vulnerability awareness techniques can effectively help OSS users to identify vulnerabilities at the early stage for timely defense. Different from the vulnerability detection techniques for traditional software, the transparency and cooperativity of OSS vulnerabilities bring great challenges to vulnerability awareness. Therefore, various techniques are proposed by scholars and developers to perceive potential vulnerabilities and risks in OSS from the code and open source community, so as to find OSS vulnerabilities as early as possible and reduce the losses caused by the vulnerabilities. To boost the development of OSS vulnerability awareness techniques, this study conducts a systematic literature review of existing research works. The study selects 45 high-level papers on open source vulnerability awareness techniques, including code-based, open source community discussion-based, and patch-based vulnerability awareness techniques. The results of these papers are systematically summarized. Especially, this study proposes the category of techniques based on the OSS vulnerability life cycle for the first time according to the most recent publications, which supplements and improves the existing taxonomy of vulnerability awareness techniques. Finally, the study discusses the challenges in the field and predicts future research direction.

    参考文献
    [1] Liu BC, Shi L, Cai ZH, Li M. Software vulnerability discovery techniques: A survey. In: Proc. of the 4th Int’l Conf. on Multimedia Information Networking and Security. Nanjing: IEEE, 2012. 152–156.
    [2] 新思科技. [2023]开源安全和风险分析报告. 2023. https://www.synopsys.com/zh-cn/software-integrity/resources/analyst-reports/open-source-security-risk-analysis.html
    Synopsys. [2023] open source security and risk analysis report. 2023 (in Chinese). https://www.synopsys.com/zh-cn/software-integrity/resources/analyst-reports/open-source-security-risk-analysis.html
    [3] Ami Luttwak AS. Log4shell 10 days later: Enterprises halfway through patching. 2021. https://www.wiz.io/blog/10-days-later-enterprises-halfway-through-patching-log4shell/
    [4] 李韵, 黄辰林, 王中锋, 袁露, 王晓川. 基于机器学习的软件漏洞挖掘方法综述. 软件学报, 2020, 31(7): 2040–2061. http://www.jos.org.cn/1000-9825/6055.htm
    Li Y, Huang CL, Wang ZF, Yuan L, Wang XC. Survey of software vulnerability mining methods based on machine learning. Ruan Jian Xue Bao/Journal of Software, 2020, 31(7): 2040–2061 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6055.htm
    [5] 李珍, 邹德清, 王泽丽, 金海. 面向源代码的软件漏洞静态检测综述. 网络与信息安全学报, 2019, 5(1): 1–14. [doi: 10.11959/j.issn.2096-109x.2019001]
    Li Z, Zou DQ, Wang ZL, Jin H. Survey on static software vulnerability detection for source code. Chinese Journal of Network and Information Security, 2019, 5(1): 1–14 (in Chinese with English abstract). [doi: 10.11959/j.issn.2096-109x.2019001]
    [6] Gegick M, Rotella P, Xie T. Identifying security bug reports via text mining: An industrial case study. In: Proc. of the 7th IEEE Working Conf. on Mining Software Repositories. Cape Town: IEEE, 2010. 11–20.
    [7] Pan SY, Zhou JY, Cogo FR, Xia X, Bao LF, Hu X, Li SP, Hassan AE. Automated unearthing of dangerous issue reports. In: Proc. of the 30th ACM Joint European Software Engineering Conf. and the Symp. on the Foundations of Software Engineering. Singapore: ACM, 2022. 834–846.
    [8] Zhou YQ, Siow JK, Wang CY, Liu SQ, Liu Y. SPI: Automated identification of security patches via commits. ACM Transactions on Software Engineering and Methodology, 2021, 31(1): 13. [doi: 10.1145/3468854]
    [9] Zhou JY, Pacheco M, Wan ZY, Xia X, Lo D, Wang Y, Hassan AE. Finding a needle in a haystack: Automated mining of silent vulnerability fixes. In: Proc. of the 36th IEEE/ACM Int’l Conf. on Automated Software Engineering (ASE). Melbourne: IEEE, 2021. 705–716.
    [10] Shahriar H, Zulkernine M. Mitigating program security vulnerabilities: Approaches and challenges. ACM Computing Surveys, 2012, 44(3): 11. [doi: 10.1145/2187671.2187673]
    [11] Lin GJ, Wen S, Han QL, Zhang J, Xiang Y. Software vulnerability detection using deep neural networks: A survey. Proceedings of the IEEE, 2020, 108(10): 1825–1848. [doi: 10.1109/JPROC.2020.2993293]
    [12] Wang HT, Ye GX, Tang ZY, Tan SH, Huang SF, Fang DY, Feng YS, Bian LZ, Wang Z. Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Transactions on Information Forensics and Security, 2020, 16: 1943–1958. [doi: 10.1109/TIFS.2020.3044773]
    [13] Chakraborty S, Krishna R, Ding YRB, Ray B. Deep learning based vulnerability detection: Are we there yet? IEEE Trans. on Software Engineering, 2022, 48(9): 3280–3296.
    [14] Gao J, Yang X, Fu Y, Jiang Y, Sun JG. VulSeeker: A semantic learning based vulnerability seeker for cross-platform binary. In: Proc. of the 33rd ACM/IEEE Int’l Conf. on Automated Software Engineering. Montpellier: ACM, 2018. 896–899.
    [15] Yamaguchi F, Golde N, Arp D, Rieck K. Modeling and discovering vulnerabilities with code property graphs. In: Proc. of the 2014 IEEE Symp. on Security and Privacy. Berkeley: IEEE, 2014. 590–604.
    [16] Feng ZY, Guo DY, Tang DY, Duan N, Feng XC, Gong M, Shou LJ, Qin B, Liu T, Jiang DX, Zhou M. CodeBERT: A pre-trained model for programming and natural languages. In: Proc. of the Findings of the 2020 Association for Computational Linguistics. Association for Computational Linguistics, 2020. 1536–1547.
    [17] Guo DY, Ren S, Lu S, Feng ZY, Tang DY, Liu SJ, Zhou L, Duan N, Svyatkovskiy A, Fu SY, Tufano M, Deng SK, Clement CB, Drain D, Sundaresan N, Yin J, Jiang DX, Zhou M. GraphCodeBERT: Pre-training code representations with data flow. In: Proc. of the 9th Int’l Conf. on Learning Representations. OpenReview.net, 2021.
    [18] Nguyen VA, Nguyen DQ, Nguyen V, Le T, Tran QH, Phung D. ReGVD: Revisiting graph neural networks for vulnerability detection. In: Proc. of the 44th IEEE/ACM Int’l Conf. on Software Engineering: Companion Proc. Pittsburgh: IEEE, 2022. 178–182.
    [19] Lu S, Guo DY, Ren S, Huang JJ, Svyatkovskiy A, Blanco A, Clement CB, Drain D, Jiang DX, Tang DY, Ge Li, Zhou LD, Shou LJ, Zhou L, Tufano M, Gong M, Zhou M, Duan N, Sundaresan N, Deng SK, Fu SY, Liu SJ. CodeXGLUE: A machine learning benchmark dataset for code understanding and generation. In: Proc. of the 35th Conf. on Neural Information Processing Systems Track on Datasets and Benchmarks. Openreview.net, 2021.
    [20] Hin D, Kan A, Chen HM, Babar MA. LineVD: Statement-level vulnerability detection using graph neural networks. In: Proc. of the 19th Int’l Conf. on Mining Software Repositories. Pittsburgh: ACM, 2022. 596–607.
    [21] Li Y, Wang SH, Nguyen TN. Vulnerability detection with fine-grained interpretations. In: Proc. of the 29th ACM Joint Meeting on European Software Engineering Conf. and the Symp. on the Foundations of Software Engineering. Athens: CAN, 2021. 292–303.
    [22] Cao SC, Sun XB, Bo LL, Wu RX, Li B, Tao CQ. MVD: Memory-related vulnerability detection based on flow-sensitive graph neural networks. In: Proc. of the 44th IEEE/ACM Int’l Conf. on Software Engineering. Pittsburgh: IEEE, 2022. 1456–1468.
    [23] Calzavara S, Conti M, Focardi R, Rabitti A, Tolomei G. Machine learning for web vulnerability detection: The case of cross-site request forgery. IEEE Security & Privacy, 2020, 18(3): 8–16. [doi: 10.1109/MSEC.2019.2961649]
    [24] Amouei M, Rezvani M, Fateh M. RAT: Reinforcement-learning-driven and adaptive testing for vulnerability discovery in web application firewalls. IEEE Transactions on Dependable and Secure Computing, 2022, 19(5): 3371–3386. [doi: 10.1109/TDSC.2021.3095417]
    [25] Chen JC. Finding ethereum smart contracts security issues by comparing history versions. In: Proc. of the 35th IEEE/ACM Int’l Conf. on Automated Software Engineering. Melbourne: IEEE, 2020. 1382–1384.
    [26] Perl H, Dechand S, Smith M, Arp D, Yamaguchi F, Rieck K, Fahl S, Acar Y. VCCFinder: Finding potential vulnerabilities in open-source projects to assist code audits. In: Proc. of the 22nd ACM SIGSAC Conf. on Computer and Communications Security. Denver: ACM, 2015. 426–437.
    [27] Zhou YQ, Sharma A. Automated identification of security issues from commit messages and bug reports. In: Proc. of the 11th Joint Meeting on Foundations of Software Engineering. Paderborn: ACM, 2017. 914–919.
    [28] Chen Y, Santosa AE, Yi AM, Sharma A, Sharma A, Lo D. A machine learning approach for vulnerability curation. In: Proc. of the 17th Int’l Conf. on Mining Software Repositories. Seoul: ACM, 2020. 32–42.
    [29] Le THM, Hin D, Croft R, Babar MA. DeepCVA: Automated commit-level vulnerability assessment with deep multi-task learning. In: Proc. of the 36th IEEE/ACM Int’l Conf. on Automated Software Engineering. Melbourne: IEEE, 2021. 717–729.
    [30] Rahaman S, Xiao Y, Afrose S, Shaon F, Tian K, Frantz M, Kantarcioglu M, Yao DF. CryptoGuard: High precision detection of cryptographic vulnerabilities in massive-sized Java projects. In: Proc. of the 2019 ACM SIGSAC Conf. on Computer and Communications Security. London: ACM, 2019. 2455–2472.
    [31] Cui L, Hao ZY, Jiao Y, Fei HQ, Yun XC. VulDetector: Detecting vulnerabilities using weighted feature graph comparison. IEEE Transactions on Information Forensics and Security, 2020, 16: 2004–2017. [doi: 10.1109/TIFS.2020.3047756]
    [32] Xu YF, Xu ZZ, Chen BH, Song F, Liu Y, Liu T. Patch based vulnerability matching for binary programs. In: Proc. of the 29th ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. ACM, 2020. 376–387.
    [33] Xiao Y, Chen BH, Yu CD, Xu ZZ, Yuan ZM, Li F, Liu BH, Liu Y, Huo W, Zou W, Shi WC. MVP: Detecting vulnerabilities using patch-enhanced vulnerability signatures. In: Proc. of the 29th USENIX Security Symp. USENIX Association, 2020. 1165–1182.
    [34] Xue YX, Ma ML, Lin Y, Sui YL, Ye JM, Peng TY. Cross-contract static analysis for detecting practical reentrancy vulnerabilities in smart contracts. In: Proc. of the 35th IEEE/ACM Int’l Conf. on Automated Software Engineering. Melbourne: IEEE, 2020. 1029–1040.
    [35] Ma FC, Xu ZY, Ren M, Yin ZJ, Chen YL, Qiao L, Gu B, Li HZ, Jiang Y, Sun JG. Pluto: Exposing vulnerabilities in inter-contract scenarios. IEEE Transactions on Software Engineering, 2022, 48(11): 4380–4396. [doi: 10.1109/TSE.2021.3117966]
    [36] Belleville B, Shen WB, Volckaert S, Azab AM, Franz M. KALD: Detecting direct pointer disclosure vulnerabilities. IEEE Transactions on Dependable and Secure Computing, 2021, 18(3): 1369–1377. [doi: 10.1109/TDSC.2019.2915829]
    [37] Hough K, Welearegai G, Hammer C, Bell J. Revealing injection vulnerabilities by leveraging existing tests. In: Proc. of the 42nd IEEE/ACM Int’l Conf. on Software Engineering. Seoul: IEEE, 2020. 284–296.
    [38] Fu Y, Ren M, Ma FC, Shi HY, Yang X, Jiang Y, Li HZ, Shi X. EVMFuzzer: Detect EVM vulnerabilities via fuzz testing. In: Proc. of the 27th ACM Joint Meeting on European Software Engineering Conf. and the Symp. on the Foundations of Software Engineering. Tallinn: ACM, 2019. 1110–1114.
    [39] Nguyen TD, Pham LH, Sun J, Lin Y, Minh QT. sFuzz: An efficient adaptive fuzzer for solidity smart contracts. In: Proc. of the ACM/IEEE 42nd Int’l Conf. on Software Engineering. Seoul: ACM, 2020. 778–788.
    [40] Zalewski M. American fuzzy lop. 2020. https://lcamtuf.coredump.cx/afl/
    [41] Li YK, Xue YX, Chen HX, Wu XH, Zhang C, Xie XF, Wang HJ. Cerebro: Context-aware adaptive fuzzing for effective vulnerability detection. In: Proc. of the 27th ACM Joint Meeting on European Software Engineering Conf. and the Symp. on the Foundations of Software Engineering. Tallinn: ACM, 2019. 533–544.
    [42] Kim IL, Zheng YH, Park H, Wang WH, You W, Aafer Y, Zhang XY. Finding client-side business flow tampering vulnerabilities. In: Proc. of the 42nd ACM/IEEE Int’l Conf. on Software Engineering. Seoul: ACM, 2020. 222–233.
    [43] Wang HJ, Xie XF, Li Y, Wen C, Li YK, Liu Y, Qin SC, Chen HX, Sui YL. Typestate-guided fuzzer for discovering use-after-free vulnerabilities. In: Proc. of the 42nd ACM/IEEE Int’l Conf. on Software Engineering. Seoul: ACM, 2020. 999–1010.
    [44] Chen YH, Li P, Xu J, Guo SJ, Zhou RD, Zhang YL, Wei T, Lu L. SAVIOR: Towards bug-driven hybrid testing. In: Proc. of the 2020 IEEE Symp. on Security and Privacy. San Francisco: IEEE, 2020. 1580–1596.
    [45] Yu KP, Wang CX, Cai Y, Luo XP, Yang ZJ. Detecting concurrency vulnerabilities based on partial orders of memory and thread events. In: Proc. of the 29th ACM Joint Meeting on European Software Engineering Conf. and the Symp. on the Foundations of Software Engineering. Athens: ACM, 2021. 280–291.
    [46] Liu YX, Zhang MX, Meng W. Revealer: Detecting and exploiting regular expression denial-of-service vulnerabilities. In: Proc. of the 2021 IEEE Symp. on Security and Privacy. San Francisco: IEEE, 2021. 1468–1484.
    [47] Behl D, Handa S, Arora A. A bug mining tool to identify and analyze security bugs using naive bayes and TF-IDF. In: Proc. of the 2014 Int’l Conf. on Reliability Optimization and Information Technology. Faridabad: IEEE, 2014. 294–299.
    [48] Pereira M, Kumar A, Cristiansen S. Identifying security bug reports based solely on report titles and noisy data. In: Proc. of the 2019 IEEE Int’l Conf. on Smart Computing. Washington: IEEE, 2019. 39–44.
    [49] Peters F, Tun TT, Yu YJ, Nuseibeh B. Text filtering and ranking for security bug report prediction. IEEE Transactions on Software Engineering, 2019, 45(6): 615–631. [doi: 10.1109/TSE.2017.2787653]
    [50] Shu R, Xia TP, Chen JF, Williams L, Menzies T. How to better distinguish security bug reports (using dual hyperparameter optimization). Empirical Software Engineering, 2021, 26(3): 53. [doi: 10.1007/s10664-020-09906-8]
    [51] Kudjo PK, Chen JF, Zhou MM, Mensah S, Huang R. Improving the accuracy of vulnerability report classification using term frequency-inverse gravity moment. In: Proc. of the 19th IEEE Int’l Conf. on Software Quality, Reliability and Security. Sofia: IEEE, 2019. 248–259.
    [52] Goseva-Popstojanova K, Tyo J. Identification of security related bug reports via text mining using supervised and unsupervised classification. In: Proc. of the 2018 IEEE Int’l Conf. on Software Quality, Reliability and Security. Lisbon: IEEE, 2018. 344–355.
    [53] Wu XX, Zheng W, Xia X, LO D. Data quality matters: A case study on data label correctness for security bug report prediction. IEEE Transactions on Software Engineering, 2022, 48(7): 2541–2556. [doi: 10.1109/TSE.2021.3063727]
    [54] Pletea D, Vasilescu B, Serebrenik A. Security and emotion: Sentiment analysis of security discussions on GitHub. In: Proc. of the 11th Working Conf. on Mining Software Repositories. Hyderabad: ACM, 2014. 348–351.
    [55] Cois CA, Kazman R. Natural language processing to quantify security effort in the software development lifecycle. In: Proc. of the 27th Int’l Conf. on Software Engineering and Knowledge Engineering. Pittsburgh: KSI Research Inc. and Knowledge Systems Institute Graduate School, 2015. 716–721.
    [56] Hindle A, Ernst NA, Godfrey MW, Mylopoulos J. Automated topic naming to support cross-project analysis of software maintenance activities. In: Proc. of the 8th Working Conf. on Mining Software Repositories. Honolulu: ACM, 2011. 163–172.
    [57] Oyetoyan TD, Morrison P. An improved text classification modelling approach to identify security messages in heterogeneous projects. Software Quality Journal, 2021, 29(2): 509–553. [doi: 10.1007/s11219-020-09546-7]
    [58] Le THM, Hin D, Croft R, Babar MA. PUMiner: Mining security posts from developer question and answer websites with PU learning. In: Proc. of the 17th Int’l Conf. on Mining Software Repositories. Seoul: ACM, 2020. 350–361.
    [59] Ramsauer R, Bulwahn L, Lohmann D, Mauerer W. The sound of silence: Mining security vulnerabilities from secret integration channels in open-source projects. In: Proc. of the 2020 ACM SIGSAC Conf. on Cloud Computing Security Workshop. New York: Association for Computing Machinery, 2020. 147–157
    [60] Wang XD, Sun K, Batcheller A, Jajodia S. Detecting “0-day” vulnerability: An empirical study of secret security patch in OSS. In: Proc. of the 49th Annual IEEE/IFIP Int’l Conf. on Dependable Systems and Networks. Portland: IEEE, 2019. 485–492.
    [61] Xu ZZ, Chen BH, Chandramohan M, Liu Y, Song F. SPAIN: Security patch analysis for binaries towards understanding the pain and pills. In: Proc. of the 39th IEEE/ACM Int’l Conf. on Software Engineering. Buenos Aires: IEEE, 2017. 462–472.
    [62] Sabetta A, Bezzi M. A practical approach to the automatic classification of security-relevant commits. In: Proc. of the 2018 IEEE Int’l Conf. on Software Maintenance and Evolution. Madrid: IEEE, 2018. 579–582.
    [63] Li Z, Zou DQ, Xu SH, Jin H, Qi HC, He J. VulPecker: An automated vulnerability detection system based on code similarity analysis. In: Proc. of the 32nd Annual Conf. on Computer Security Applications. Los Angeles: ACM, 2016. 201–213.
    [64] Li Z, Zou DQ, Xu SH, Ou XY, Jin H, Wang SJ, Deng ZJ, Zhong YY. VulDeePecker: A deep learning-based system for vulnerability detection. In: Proc. of the 25th Annual Network and Distributed System Security Symp. San Diego: The Internet Society, 2018.
    [65] Ponta SE, Plate H, Sabetta A, Bezzi M, Dangremont C. A manually-curated dataset of fixes to vulnerabilities of open-source software. In: Proc. of the 16th IEEE/ACM Int’l Conf. on Mining Software Repositories. Montreal: IEEE, 2019. 383–387.
    [66] Russell R, Kim L, Hamilton L, Lazovich T, Harer J, Ozdemir O, Ellingwood P, McConley M. Automated vulnerability detection in source code using deep representation learning. In: Proc. of the 17th IEEE Int’l Conf. on Machine Learning and Applications. Orlando: IEEE, 2018. 757–762.
    [67] Zhou YQ, Liu SQ, Siow J, Du XN, Liu Y. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Proc. of the 33rd Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2019. 10197–10207.
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

詹奇,潘圣益,胡星,鲍凌峰,夏鑫.开源软件漏洞感知技术综述.软件学报,2024,35(1):19-37

复制
分享
文章指标
  • 点击次数:2128
  • 下载次数: 4956
  • HTML阅读次数: 2766
  • 引用次数: 0
历史
  • 收稿日期:2022-10-21
  • 最后修改日期:2023-01-13
  • 在线发布日期: 2023-08-23
  • 出版日期: 2024-01-06
文章二维码
您是第19893674位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号