开源软件供应链漏洞威胁智能感知
作者:
中图分类号:

TP311

基金项目:

中国科学院战略性先导科技专项(XDA0320401); 国家自然科学基金青年项目(62202457)


Intelligent Perception for Vulnerability Threats in Open-source Software Supply Chain
Author:
  • 摘要
  • | |
  • 访问统计
  • |
  • 参考文献 [64]
  • |
  • 相似文献 [20]
  • | | |
  • 文章评论
    摘要:

    开源软件的繁荣推动了软件领域的蓬勃发展, 也促使以开源软件为基础的供应链开发模式的形成. 开源软件供应链本质上是个复杂的供应链拓扑网络, 由开源生态的关键元素及其关联关系构成, 其产品全球化等优势有助于提高软件行业的开发效率. 然而, 开源软件供应链也存在依赖关系复杂、传播范围广泛、攻击面暴露扩大等特点, 带来了新的安全风险. 现有的以安全漏洞、威胁情报为基础的安全管理虽然可以实现安全预警、预先防御, 但是由于漏洞威胁信息获取不及时、缺少攻击技术和缓解措施等信息, 严重影响了漏洞处理效率. 针对上述问题, 设计并实现一种针对开源软件供应链的漏洞威胁智能感知方法, 包括两部分: 1)构建CTI (网络威胁情报)知识图谱, 在其构建的过程中使用到相关技术, 可以实现安全情报的实时分析与处理, 尤其提出SecERNIE模型以及软件包命名矩阵, 分别缓解漏洞威胁关联挖掘的问题和开源软件别名的问题. 2)漏洞风险信息推送, 以软件包命名矩阵为基础, 构建软件包过滤规则, 实现开源系统漏洞实时过滤与推送. 通过实验验证所提方法的有效性和可用性. 实验结果显示, 相较于NVD等传统漏洞平台, 本方法平均感知时间最高提前90.03天; 在操作系统软件覆盖率上提升74.37%, 并利用SecERNIE模型实现63492个CVE漏洞与攻击技术实体之间的关联关系映射. 特别地, 针对openEuler操作系统, 可追踪的系统软件覆盖率达到92.76%, 并累计感知6239个安全漏洞; 同时, 还发现openEuler中891条漏洞与攻击的关联关系, 进而获取到相应的解决方案, 为漏洞处理提供了参考依据. 在真实攻击环境验证2种典型的攻击场景, 证明所提方法在漏洞威胁感知方面的良好的效果.

    Abstract:

    The prosperity of open-source software has spurred robust growth in the software industry and has also facilitated the formation of a supply chain development model based on open-source software. Essentially, the open-source software supply chain is a complex topology network, composed of key elements of the open-source ecosystem and their interrelations. Its globalized product advantages contribute to enhancing the development efficiency of the software industry. However, the open-source software supply chain also has characteristics such as intricate dependencies, widespread propagation, and an expanded attack surface, introducing new security risks. Although existing security management based on vulnerabilities and threat intelligence can achieve early warnings and proactive defense, the efficiency of vulnerability handling is severely affected due to delays in obtaining vulnerability threat information, and the lack of attack techniques and mitigation measures. Addressing these issues, a vulnerability threat intelligence sensing method for the open-source software supply chain is designed and implemented, which includes two parts: 1) Construction of the cyber threat intelligence (CTI) knowledge graph. In the process of constructing it, relevant technologies are utilized to achieve real-time analysis and processing of security intelligence. Particularly, the SecERNIE model and the software package naming matrix are introduced to address the challenges of vulnerability threat correlation mining and open-source software alias issues, respectively. 2) Vulnerability risk information push,based on the software package naming matrix, software package filtering rules are established to enable real-time filtering and pushing of vulnerabilities in open-source systems. This study validates the effectiveness and applicability of the proposed method through experiments. Results show that, compared to traditional vulnerability platforms like NVD, the proposed method advances the sensing time by an average of 90.03 days. The coverage rate of operating system software increases by 74.37%, and using the SecERNIE model, the relationships between 63492 CVE vulnerabilities and attack technique entities are mapped. Specifically, for the openEuler operating system, the traceable system software coverage rate reaches 92.76%, with 6239 security vulnerabilities detected. This study also identifies 891 vulnerability-attack correlations in openEuler, obtaining corresponding solutions that serve as a reference for vulnerability handling. Two typical attack scenarios in a real attack environment are verified, demonstrating the efficacy of the proposed method in vulnerability threat perception.

    参考文献
    [1] 梁冠宇, 武延军, 吴敬征, 赵琛. 面向操作系统可靠性保障的开源软件供应链. 软件学报, 2020, 31(10): 3056–3073. http://www.jos.org.cn/1000-9825/6070.htm
    Liang GY, Wu YJ, Wu JZ, Zhao C. Open source software supply chain for reliability assurance of operating systems. Ruan Jian Xue Bao/Journal of Software, 2020, 31(10): 3056–3073 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6070.htm
    [2] Sonatype. 9th annual state of the software supply chain. 2021. https://www.sonatype.com/resources/state-of-the-software-supply-chain-2021
    [3] Dell. Dell data security survey finds that a lack of security knowledge limits business initiatives. 2016. https://www.businesswire.com/news/home/20160308005494/en/Dell-Data-Security-Survey-Finds-that-a-Lack-of-Security-Knowledge-Limits-Business-Initiatives
    [4] ACSM. The numbers behind a cyber pandemic. 2021. https://australiancybersecuritymagazine.com.au/the-numbers-behind-a-cyber-pandemic/
    [5] Synopsys. Open source security and risk analysis report. 2021. http://www.synopsys.com/software-integrity/resources/analyst-reports/opensource-security-risk-analysis.html
    [6] Forain I, de Oliveira Albuquerque R, de Sousa Júnior RT. Towards system security: What a comparison of national vulnerability databases reveals. In: Proc. of the 17th Iberian Conf. on Information Systems and Technologies. Madrid: IEEE, 2022. 1–6.
    [7] 纪守领, 王琴应, 陈安莹, 赵彬彬, 叶童, 张旭鸿, 吴敬征, 李昀, 尹建伟, 武延军. 开源软件供应链安全研究综述. 软件学报, 2023, 34(3): 1330–1364. http://www.jos.org.cn/1000-9825/6717.htm
    Ji SL, Wang QY, Chen AY, Zhao BB, Ye T, Zhang XH, Wu JZ, Li J, Yin JW, Wu YJ. Survey on open-source software supply chain security. Ruan Jian Xue Bao/Journal of Software, 2023, 34(3): 1330–1364 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6717.htm
    [8] CISA. Advanced persistent threat compromise of government agencies, critical infrastructure, and private sector organizations. 2021. https://www.cisa.gov/news-events/cybersecurity-advisories/aa20-352a
    [9] Extracting a 19 year old code execution from WinRAR. 2019. https://research.checkpoint.com/2019/extracting-code-execution-from-winrar/
    [10] Cyberattaque Chsf: Communique De Presse. 2022. https://www.ght-idfsud.fr/portail/actualites/contenus-44-59.html
    [11] Endsley MR. Toward a theory of situation awareness in dynamic systems. Human Factors, 1995, 37(1), 32–64.
    [12] Bass T. Multisensor data fusion for next generation distributed intrusion detection systems. In: Proc. of the 1999 RIS National Symp. on Sensor and Data Fusion. The Johns Hopkins University Applied Physics Laboratory, 1999. 24–27.
    [13] Jajodia S, Noel S, O’Berry B. Topological analysis of network attack vulnerability. In: Kumar V, Srivastava J, Lazarevic A, eds. Managing Cyber Threats: Issues, Approaches, and Challenges. New York: Springer, 2005. 247–266. [doi: 10.1007/0-387-24230-9_9]
    [14] 刘峤, 李杨, 段宏, 刘瑶, 秦志光. 知识图谱构建技术综述. 计算机研究与发展, 2016, 53(3): 582–600.
    Liu Q, Li Y, Duan H, Liu Y, Qin ZG. Knowledge graph construction techniques. Journal of Computer Research and Development, 2016, 53(3): 582–600 (in Chinese with English abstract).
    [15] Zheng ZQ, Liu YG, Zhang Y, Wen CB. TCMKG: A deep learning based traditional Chinese medicine knowledge graph platform. In: Proc. of the 2020 IEEE Int’l Conf. on Knowledge Graph (ICKG). Nanjing: IEEE, 2020. 560–564.
    [16] Zhao QS, Hu WT, Ding HH. Study on military equipment knowledge construction based on knowledge graph. In: Proc. of the 8th Int’l Conf. on Big Data and Information Analytics (BigDIA). Guiyang: IEEE, 2022. 336–341.
    [17] Zhang F, Wu JZ, Nie YL, Jiang LH, Zhou AL, Xie NF. Research of knowledge graph technology and its applications in agricultural information consultation field. In: Proc. of the 39th IEEE Int’l Performance Computing and Communications Conf. (IPCCC). Austin: IEEE, 2020. 1–4. [doi: 10.1109/IPCCC50635.2020.9391515]
    [18] Fu LJ, Bai Y, Zhong ZY. Constructing a vertical knowledge graph for non-traditional machining industry. In: Proc. of the 15th IEEE Int’l Conf. on Networking, Sensing and Control (ICNSC). Zhuhai: IEEE, 2018. 1–5. [doi: 10.1109/ICNSC.2018.8361341]
    [19] Zhong XF, Zhang Y, Liu JJ, Yang GZ, Zhou SC, Wang P. Research on automated cyber asset scanning tools based on cybersecurity knowledge graph. In: Proc. of the 7th Int’l Conf. on Computer and Communications (ICCC). Chengdu: IEEE, 2021. 2046–2049.
    [20] 王一琁. 基于知识图谱的网络安全态势感知技术研究与实现 [硕士学位论文]. 成都: 电子科技大学, 2020.
    Wang YX. Research and implementation of NSSA technology based on knowledge graph [MS. Thesis]. Chengdu: University of Electronic Science and Technology of China, 2020 (in Chinese with English abstract).
    [21] 王丽敏. 漏洞知识图谱的构建及漏洞态势感知技术研究 [硕士学位论文]. 北京: 中国科学院大学(中国科学院大学人工智能学院), 2020.
    Wang LM. Research on construction of vulnerability knowledge graph and vulnerability situation awareness [MS. Thesis]. Beijing: University of Chinese Academy of Sciences (School of Artificial Intelligence), 2020 (in Chinese with English abstract).
    [22] Firth JR. Applications of general linguistics. Trans. of the Philological Society, 1957, 56(1): 1–14.
    [23] Deerwester SC, Dumais ST, Landauer TK, Harshman R. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, 41(6): 391–407.
    [24] Lund K, Burgess C. Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers, 1996, 28(2): 203–208. [doi: 10.3758/BF03204766]
    [25] Rohde DLT, Gonnerman LM, Plaut DC. An improved model of semantic similarity based on lexical co-occurrence. 2005. https://www.cnbc.cmu.edu/~plaut/papers/pdf/RohdeGonnermanPlautSUB-CogSci.COALS.pdf
    [26] Raff E, Zak R, Cox R, Sylvester J, Yacci P, Ward R, Tracy A, McLean M, Nicholas C. An investigation of byte n-gram features for malware classification. Journal of Computer Virology and Hacking Techniques, 2018, 14(1): 1–20.
    [27] Hancock JT, Khoshgoftaar TM. Survey on categorical data for neural networks. Journal of Big Data, 2020, 7(1): 28.
    [28] Hinton GE. Learning distributed representations of concepts. In: Morris RGM, ed. Parallel Distributed Processing: Implications for Psychology and Neurobiology. Oxford: Clarendon Press/Oxford University Press, 1989.
    [29] Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In: Proc. of the 26th Int’l Conf. on Neural Information Processing Systems. Lake Tahoe: Curran Associates Inc., 2013. 3111–3119.
    [30] Mikolov T, Chen K, Corrado GS, Dean J. Efficient estimation of word representations in vector space. In: Proc. of the 2013 Int’l Conf. on Learning Representations. Scottsdale, 2013. 1–12.
    [31] Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional Transformers for language understanding. In: Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Vol. 1 (Long and Short Papers). Minneapolis: Association for Computational Linguistics, 2018. 4171–4186. [doi: 10.18653/v1/N19-1423]
    [32] Zhang ZY, Han X, Liu ZY, Jiang X, Sun MS, Liu Q. ERNIE: Enhanced language representation with informative entities. In: Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: Association for Computational Linguistics, 2019. 1441–1451. [doi: 10.18653/v1/P19-1139]
    [33] Sun Y, Wang SH, Li YK, Feng SK, Chen XY, Zhang H, Tian X, Zhu DX, Tian H, Wu H. ERNIE: Enhanced representation through knowledge integration. arXiv:1904.09223, 2019
    [34] 陈德岗. 基于深度学习的实体关系抽取算法研究 [硕士学位论文]. 成都: 电子科技大学, 2021.
    Chen DG. Research on entity relationship extraction algorithm based on deep learning [MS. Thesis]. Chengdu: University of Electronic Science and Technology of China, 2021 (in Chinese with English abstract).
    [35] Chinchor N, Marsh E. Appendix D: MUC-7 information extraction task definition (version 5.1). 1998. https://www-nlpir.nist.gov/related_projects/muc/proceedings/ie_task.html
    [36] Appelt DE, Hobbs JR, Bear J, Israel D, Kameyama M, Martin D, Myers K, Tyson M. SRI international FASTUS system: MUC-6 test results and analysis. In: Proc. of the 6th Conf. on Message Understanding (MUC-6). Columbia, 1995. 237–248.
    [37] Nahm UY, Mooney RJ. Using soft-matching mined rules to improve information extraction. In: Proc. of the 2004 AAAI Workshop on Adaptive Text Extraction and Mining (ATEM 2004). San Jose, 2004. 1–6.
    [38] Brin S. Extracting patterns and relations from the World Wide Web. In: Proc. of the 1999 World Wide Web and Databases. Valencia: Springer, 1999. 172–183. [doi: 10.1007/10704656_11]
    [39] Tong HQ, Lu XC. Consumption psychoanalysis and customer relationship management based on association rules mining. In: Proc. of the 2009 WRI World Congress on Computer Science and Information Engineering. Los Angeles: IEEE, 2009. 384–388.
    [40] Mallik S, Mukhopadhyay A, Maulik U, Bandyopadhyay S. Integrated analysis of gene expression and genome-wide DNA methylation for tumor prediction: An association rule mining-based approach. In: Proc. of the 2013 IEEE Symp. on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). Singapore: IEEE, 2013. 120–127. [doi: 10.1109/CIBCB.2013.6595397]
    [41] Srisawat A. An application of association rule mining based on stock market. In: Proc. of the 3rd Int’l Conf. on Data Mining and Intelligent Information Technology Applications. Macao: IEEE, 2011. 259–262.
    [42] 张娅. 基于深度学习的远程监督关系抽取算法研究 [硕士学位论文]. 北京: 北京交通大学, 2021.
    Zhang Y. Research on the distance supervise relation extraction based on deep learning [MS. Thesis]. Beijing: Beijing Jiaotong University, 2021 (in Chinese with English abstract).
    [43] Collins M, Duffy NP. Convolution kernels for natural language. In: Proc. of the 14th Int’l Conf. on Neural Information Processing Systems: Natural and Synthetic. Vancouver: MIT Press, 2001. 625–632.
    [44] Liu CY, Sun WB, Chao WH, Che WX. Convolution neural network for relation extraction. In: Proc. of the 9th Int’l Conf. on Advanced Data Mining and Applications. Hangzhou: Springer, 2013. 231–242. [doi: 10.1007/978-3-642-53917-6_21]
    [45] Xu K, Feng YS, Huang SF, Zhao DY. Semantic relation classification via convolutional neural networks with simple negative sampling. In: Proc. of the 2015 Conf. on Empirical Methods in Natural Language Processing. Lisbon: Association for Computational Linguistics, 2015. 536–540. [doi: 10.18653/v1/D15-1062]
    [46] Guo ZJ, Zhang Y, Lu W. Attention guided graph convolutional networks for relation extraction. In: Proc. of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: Association for Computational Linguistics, 2019. 241–251.
    [47] Chen JX, Ji DH, Tan CL, Niu ZY. Relation extraction using label propagation based semi-supervised learning. In: Proc. of the 21st Int’l Conf. on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics. Sydney: Association for Computational Linguistics, 2006. 129–136. [doi: 10.3115/1220175.1220192]
    [48] Peng JJ, Zhu LJ, Wang YD, Chen J. Mining relationships among multiple entities in biological networks. IEEE/ACM Trans. on Computational Biology and Bioinformatics, 2020, 17(3): 769–776.
    [49] Zhou ZY, Zhang HY. Research on entity relationship extraction in financial and economic field based on deep learning. In: Proc. of the 4th IEEE Int’l Conf. on Computer and Communications (ICCC). Chengdu: IEEE, 2018. 2430–2435.
    [50] 齐越, 刘金芳, 李宁. 开源软件供应链安全风险分析. 信息安全研究, 2021, 7(9): 790–794.
    Qi Y, Liu JF, Li N. The analysis of security risk in open source software supply chain. Journal of Information Security Research, 2021, 7(9): 790–794 (in Chinese with English abstract).
    [51] Dong Y, Guo WB, Chen YQ, Xing XY, Zhang YQ, Wang G. Towards the detection of inconsistencies in public security vulnerability reports. In: Proc. of the 28th USENIX Conf. on Security Symp. Santa Clara: USENIX Association, 2019. 869–885.
    [52] Shao YJ, Wu YJ, Yang MT, Luo TY, Wu JZ. A large-scale study on vulnerabilities in Linux using Vtopia. In: Proc. of the 21st IEEE Int’l Conf. on Software Quality, Reliability and Security Companion (QRS-C). IEEE, 2021. 1–10. [doi: 10.1109/QRS-C55045.2021.00157]
    [53] Hemberg E, Kelly J, Shlapentokh-Rothman M, Reinstadler B, Xu K, Rutar N, O’Reilly UM. Linking threat tactics, techniques, and patterns with defensive weaknesses, vulnerabilities and affected platform configurations for cyber hunting. arXiv:2010.00533, 2020.
    [54] Li BH, Zhou H, He JX, Wang MX, Yang YM, Li L. On the sentence embeddings from pre-trained language models. In: Proc. of the 2020 Conf. on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2020. 9119–9130. [doi: 10.18653/v1/2020.emnlp-main.733]
    [55] Lim SK, Muis AO, Lu W, Ong CH. MalwareTextDB: A database for annotated malware articles. In: Proc. of the 55th Annual Meeting of the Association for Computational Linguistics (Vol.1: Long Papers). Vancouver: Association for Computational Linguistics, 2017. 1557–1567. [doi: 10.18653/v1/P17-1143]
    [56] Satyapanich T, Ferraro F, Finin T. CASIE: Extracting cybersecurity event information from text. In: Proc. of the 34th AAAI Conf. on Artificial Intelligence. New York: AAAI Press, 2020. 8749–8757. [DOI: 10.1609/aaai.v34i05.6401]
    引证文献
    网友评论
    网友评论
    分享到微博
    发 布
引用本文

王丽敏,吴敬征,武延军,芮志清,罗天悦,屈晟,杨牧天.开源软件供应链漏洞威胁智能感知.软件学报,2025,36(2):511-536

复制
分享
文章指标
  • 点击次数:248
  • 下载次数: 735
  • HTML阅读次数: 139
  • 引用次数: 0
历史
  • 收稿日期:2022-11-29
  • 最后修改日期:2023-05-10
  • 在线发布日期: 2024-11-01
文章二维码
您是第19708646位访问者
版权所有:中国科学院软件研究所 京ICP备05046678号-3
地址:北京市海淀区中关村南四街4号,邮政编码:100190
电话:010-62562563 传真:010-62562533 Email:jos@iscas.ac.cn
技术支持:北京勤云科技发展有限公司

京公网安备 11040202500063号