Software Supply Chain Analysis Techniques for Java Ecosystem
Author:
Affiliation:

  • Article
  • | |
  • Metrics
  • |
  • Reference [37]
  • |
  • Related
  • | | |
  • Comments
    Abstract:

    With the prosperity of open-source software, almost all software companies use these reusable components as basic build blocks to build their software products, thus forming the software supply chain. The software supply chain improves development efficiency and reduces labor costs for software companies. However, it may also introduce new security problems. In particular, if one software component has high-risk vulnerabilities, the software supply chain inevitably spreads these vulnerabilities to all its dependencies, thus amplifying these vulnerabilities' impact. For example, through the software supply chain, the Log4j2 vulnerability causes a catastrophic security issue for the whole Java ecosystem. Unfortunately, current research studies on Java software supply chain mainly focus on a single component or a group of components and miss the impact study on the ecosystem scale. Therefore, this paper presents the essential software supply analysis techniques to study the component and vulnerability impact on the Java ecosystem. More specifically, the formal definition of component dependencies is first given in the software supply chain. Next, new techniques are proposed and an analysis tool is built to analyze all component dependencies in the Java ecosystem, including over 8.8 million component versions and 65 million dependencies. Finally, Log4j2, a logging library affected by the vulnerability, is used as an example to evaluate its impact on the whole Java ecosystem. The results show that the vulnerability affects 15.12% of the ecological components (71 082) and 16.87% of the component versions (1 488 971), and the vulnerability-fix rate is only 29.13%.

    Reference
    [1] Rep-ossra-2022. https://www.synopsys.com/content/dam/synopsys/sig-assets/reports/rep-ossra-2022.pdf
    [2] Maven-Central. https://repo1.maven.org/maven2/
    [3] State of the software supply chain report. 2021. https://www.sonatype.com/resources/white-paper-2021-state-of-the-software-supply-chain-report-2021
    [4] He H, He R, Gu H, Zhou M. A large-scale empirical study on Java library migrations: Prevalence, trends, and rationales. In: Proc. of the 29th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. 2021. 478-490. [doi: 10.1145/3468264.3468571]
    [5] Wang Y, Chen B, Huang K, Shi B, Xu C, Peng X, Wu Y, Liu Y. An empirical study of usages, updates and risks of third-party libraries in Java projects. In: Proc. of the 2020 IEEE Int'l Conf. on Software Maintenance and Evolution (ICSME). 2020. 35-45. [doi: 10.1109/ICSME46990.2020.00014]
    [6] Soto-Valero C, Harrand N, Monperrus M, Baudry B. A comprehensive study of bloated dependencies in the Maven ecosystem. Empirical Software Engineering, 2021, 26(3): 1-44. [doi: 10.1007/s10664-020-09914-8]
    [7] Soto-Valero C, Durieux T, Baudry B. A longitudinal analysis of bloated Java dependencies. In: Proc. of the 29th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. 2021. 1021-1031. [doi: 10.1145/3468264.3468589]
    [8] Bavota G, Canfora G, Di Penta M, Oliveto R, Panichella S. How the apache community upgrades dependencies: An evolutionary study. Empirical Software Engineering, 2015, 20(5): 1275-1317. [doi: 10.1007/s10664-014-9325-9]
    [9] Kula RG, German DM, Ouni A, Ishio T, Inoue K. Do developers update their library dependencies. Empirical Software Engineering, 2018, 23(1): 384-417. [doi: 10.1007/s10664-017-9521-5]
    [10] Benelallam A, Harrand N, Soto-Valero C, Baudry B, Barais O. The Maven dependency graph: A temporal graph-based representation of Maven Central. In: Proc. of the 16th IEEE/ACM Int'l Conf. on Mining Software Repositories (MSR). 2019. 344-348. [doi: 10.1109/MSR.2019.00060]
    [11] Decan A, Mens T, Claes M. On the topology of package dependency networks: A comparison of three programming language ecosystems. In: Proc. of the 10th European Conf. on Software Architecture Workshops. 2016. 1-4. [doi: 10.1145/2993412.3003382]
    [12] Decan A, Mens T, Claes M. An empirical comparison of dependency issues in OSS packaging ecosystems. In: Proc. of the 24th IEEE Int'l Conf. on Software Analysis, Evolution and Reengineering (SANER). 2017. 2-12. [doi: 10.1109/SANER.2017.7884604]
    [13] Decan A, Mens T, Grosjean P. An empirical comparison of dependency network evolution in seven software packaging ecosystems. Empirical Software Engineering, 2019, 24(1): 381-416. [doi: 10.1007/s10664-017-9589-y]
    [14] Decan A, Mens T. What do package dependencies tell us about semantic versioning. IEEE Trans. on Software Engineering, 2019, 47(6): 1226-40. [doi: 10.1109/TSE.2019.2918315]
    [15] Kikas R, Gousios G, Dumas M, Pfahl D. Structure and evolution of package dependency networks. In: Proc. of the 14th IEEE/ ACM Int'l Conf. on Mining Software Repositories (MSR). 2017. 102-112. [doi: 10.1109/MSR.2017.55]
    [16] Stringer J, Tahir A, Blincoe K, Dietrich J. Technical lag of dependencies in major package managers. In: Proc. of the 27th Asia-Pacific Software Engineering Conf. (APSEC). 2020. 228-237. [doi: 10.1109/APSEC51365.2020.00031]
    [17] Abate P, Di Cosmo R, Gousios G, Zacchiroli S. Dependency solving is still hard, but we are getting better at it. In: Proc. of the 27th IEEE Int'l Conf. on Software Analysis, Evolution and Reengineering (SANER). 2020. 547-551. [doi: 10.1109/SANER48275.2020.9054837]
    [18] Düsing J, Hermann B. Analyzing the direct and transitive impact of vulnerabilities onto different artifact repositories. In: Proc. of the Digital Threats: Research and Practice. 2021. [doi: 10.1145/3472811]
    [19] Chen C. Design and implementation of vulnerability detection system based on knowledge graph [Ph. D. Thesis]. Beijing: Beijing University of Posts and Telecommunications, 2021 (in Chinese with English abstract). [doi: 10.26969/d.cnki.gbydu.2021.001167] 陈晨. 基于知识图谱的漏洞检测系统的设计与实现[硕士学位论文]. 北京: 北京邮电大学, 2021. [doi:10.26969/d.cnki.gbydu.2021.001167].
    [20] Pashchenko I, Plate H, Ponta SE, Sabetta A, Massacci F. Vuln4real: A methodology for counting actually vulnerable dependencies. IEEE Trans. on Software Engineering, 2020, 1. [doi: 10.1109/TSE.2020.3025443]
    [21] Ponta SE, Plate H, Sabetta A. Beyond metadata: Code-centric and usage-based analysis of known vulnerabilities in open-source software. In: Proc. of the 2018 IEEE Int'l Conf. on Software Maintenance and Evolution (ICSME). 2018. 449-460. [doi: 10.1109/ICSME.2018.00054]
    [22] Prana GA, Sharma A, Shar LK, Foo D, Santosa AE, Sharma A, Lo D. Out of sight, out of mind? How vulnerable dependencies affect open-source projects. Empirical Software Engineering, 2021, 26(4): 1-34. [doi: 10.1007/s10664-021-09959-3]
    [23] Lauinger T, Chaabane A, Arshad S, et al. Thou shalt not depend on me: Analysing the use of outdated JavaScript libraries on the Web. In: Proc. of the Network and Distributed System Security Symp. 2017. [doi: 10.48550/arXiv.1811.00918]
    [24] Zimmermann M, Staicu CA, Tenny C, Pradel M. Small world with high risks: A study of security threats in the NPM ecosystem. In: Proc. of the 28th USENIX Security Symp. (USENIX Security 2019). 2019. 995-1010. [doi: 10.48550/arXiv.1902.09217]
    [25] Liu C, Chen S, Fan L, Chen B, Liu Y, Peng X. Demystifying the vulnerability propagation and its evolution via dependency trees in the npm ecosystem. In: Proc. of the 44th Int'l Conf. on Software Engineering (ICSE 2022). New York: Association for Computing Machinery, 2022. 672-684. [doi: 10.48550/arXiv.2201.03981]
    [26] Ohm M, Sykosch A, Meier M. Towards detection of software supply chain attacks by forensic artifacts. In: Proc. of the 15th Int'l Conf. on Availability, Reliability and Security. 2020. 1-6. [doi: 10.1145/3407023.3409183]
    [27] Gkortzis A, Feitosa D, Spinellis D. Software reuse cuts both ways: An empirical analysis of its relationship with security vulnerabilities. Journal of Systems and Software, 2021, 172: 110653. [doi: 10.1016/j.jss.2020.110653]
    [28] NVD CVE-2021-44228. 2021. https://nvd.nist.gov/vuln/detail/CVE-2021-44228
    [29] Information note on the Apache Log4j2 vulnerability in the open source community (in Chinese). https://mp.weixin.qq.com/s/dWublxXRE2NHae7SRXUuiA 关于开源社区Apache log4j2漏洞情况的说明. https://mp.weixin.qq.com/s/dWublxXRE2NHae7SRXUuiA.
    [30] Cybersecurity risk alert on Apache Log4j2 component critical security vulnerability (in Chinese). https://wap.miit.gov.cn/jgsj/waj/gzdt/art/2021/art_d0cd32999d9941209ba9358a2e62638c.html 关于阿帕奇Log4j2组件重大安全漏洞的网络安全风险提示. https://wap.miit.gov.cn/jgsj/waj/gzdt/art/2021/art_d0cd32999d9941209ba9358a2e62638c.html.
    [31] Log4j Download. https://www.sonatype.com/resources/log4j-vulnerability-resource-center
    [32] Maven. https://maven.apache.org/
    [33] Introduction to the POM. https://maven.apache.org/guides/introduction/introduction-to-the-pom.html#:~:text=Available%20Variables-,What%20is%20a%20POM%3F,default%20values%20for%20most%20projects
    [34] Maven Central index. https://maven.apache.org/repository/central-index.html
    [35] Libraries. io. https://libraries.io/
    [36] Maven effective pom. https://maven.apache.org/plugins/maven-help-plugin/effective-pom-mojo.html
    [37] Apache Log4j security vulnerabilities. https://logging.apache.org/log4j/2.x/security.html
    Related
    Cited by
Get Citation

毛天宇,王星宇,常瑞,申文博,任奎.面向Java语言生态的软件供应链安全分析技术.软件学报,2023,34(6):2628-2640

Copy
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
  • Cited by:
History
  • Received:September 05,2022
  • Revised:December 14,2022
  • Online: January 13,2023
  • Published: June 06,2023
You are the first2044102Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063