Detection of Resource Leaks in Java Programs: Effectiveness Analysis of Traditional Models and Language Models
Author:
Affiliation:

Clc Number:

TP311

  • Article
  • | |
  • Metrics
  • |
  • Reference [48]
  • |
  • Related [16]
  • | | |
  • Comments
    Abstract:

    Resource leaks, which are defects caused by the failure to timely and properly close the limited system resources, are widely present in programs of various languages and possess a certain degree of concealment. The traditional defect detection methods usually predict the resource leaks in software based on rules and heuristic search. In recent years, defect detection methods based on deep learning have captured the semantic information in the code through different code representation forms and by using techniques such as recurrent neural networks and graph neural networks. Recent studies show that language models have performed outstandingly in tasks such as code understanding and generation. However, the advantages and limitations of large language models (LLMs) in the specific task of resource leak detection have not been fully evaluated. The effectiveness of the detection methods based on traditional models, small models, and LLMs in the task of resource leak detection is studied, and various improvement methods such as few-shot learning, fine-tuning and the combination of static analysis and LLMs are explored. Specifically, taking the JLeaks and DroidLeaks datasets as the experimental objects, the performance of different models is analyzed from multiple dimensions such as the root causes of resource leaks, resource types and code complexity. The experimental results show that the fine-tuning technique can significantly improve the detection effect of LLMs in the field of resource leak detection. However, most models still need to be improved in identifying the resource leaks caused by third-party libraries. In addition, the code complexity has a greater influence on the detection methods based on traditional models for resource leak detection.

    Reference
    [1] Ghanavati M, Costa D, Seboek J, Lo D, Andrzejak A. Memory and resource leak defects and their repairs in Java projects. Empirical Software Engineering, 2020, 25(1): 678–718.
    [2] Liu TY, Ji WX, Dong XH, Yao WH, Wang YZ, Liu H, Peng HY, Wang YX. JLeaks: A featured resource leak repository collected from hundreds of open-source Java projects. In: Proc. of the 46th IEEE/ACM Int’l Conf. on Software Engineering. Lisbon: IEEE, 2024. 1–13.
    [3] Kellogg M, Shadab N, Sridharan M, Ernst MD. Lightweight and modular resource leak verification. In: Proc. of the 29th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. Athens: ACM, 2021. 181–192. [doi: 10.1145/3468264.3468576]
    [4] Utture A, Palsberg J. From leaks to fixes: Automated repairs for resource leak warnings. In: Proc. of the 31st ACM Joint European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. San Francisco: ACM, 2023: 159–171. [doi: 10.1145/3611643.3616267]
    [5] Wang C, Liu JN, Peng X, Liu Y, Lou YL. Boosting static resource leak detection via LLM-based resource-oriented intention inference. arXiv:2311.04448, 2023.
    [6] Lo D, Nagappan N, Zimmermann T. How practitioners perceive the relevance of software engineering research. In: Proc. of the 10th Joint Meeting on Foundations of Software Engineering. Bergamo: ACM, 2015. 415–425. [doi: 10.1145/2786805.2786809]
    [7] Wang C, Lou YL, Peng X, Liu JN, Zou BH. Mining resource-operation knowledge to support resource leak detection. In: Proc. of the 31st ACM Joint European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. San Francisco: ACM, 2023. 986–998. [doi: 10.1145/3611643.361631]
    [8] Li W, Cai HP, Sui YL, Manz D. PCA: Memory leak detection using partial call-path analysis. In: Proc. of the 28th ACM Joint Meeting on European Software Engineering Conf. and Symp. on the Foundations of Software Engineering. New York: ACM, 2020. 1621–1625. [doi: 10.1145/3368089.3417923]
    [9] PMD source code analyzer. 2002. https://pmd.github.io
    [10] Wang S, Chollak D, Movshovitz-Attias D, Tan L. Bugram: Bug detection with n-gram language models. In: Proc. of the 31st IEEE/ACM Int’l Conf. on Automated Software Engineering. Singapore: IEEE, 2016. 708–719.
    [11] Pradel M, Sen K. DeepBugs: A learning approach to name-based bug detection. Proc. of the ACM on Programming Languages, 2018, 2: 147. [doi: 10.1145/3276517]
    [12] Zhang J, Wang X, Zhang HY, Sun HL, Liu XD, Hu CM, Liu Y. Detecting condition-related bugs with control flow graph neural network. In: Proc. of the 32nd ACM SIGSOFT Int’l Symp. on Software Testing and Analysis. Seattle: ACM, 2023. 1370–1382. [doi: 10.1145/3597926.3598142]
    [13] Zou DQ, Wang SJ, Xu SH, Li Z, Jin H. μVulDeePecker: A deep learning-based system for multiclass vulnerability detection. IEEE Trans. on Dependable and Secure Computing, 2021, 18(5): 2224–2236.
    [14] Zhou YQ, Liu SQ, Siow J, Du XN, Liu Y. Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks. In: Proc. of the 33rd Int’l Conf. on Neural Information Processing Systems. Vancouver: Curran Associates Inc., 2019. 915.
    [15] Nguyen VA, Nguyen DQ, Nguyen V, Le T, Tran QH, Phung D. ReGVD: Revisiting graph neural networks for vulnerability detection. In: Proc. of the 44th IEEE/ACM Int’l Conf. on Software Engineering: Companion Proc. Pittsburgh: IEEE, 2022. 178–182. [doi: 10.1145/3510454.3516865]
    [16] 李韵, 黄辰林, 王中锋, 袁露, 王晓川. 基于机器学习的软件漏洞挖掘方法综述. 软件学报, 2020, 31(7): 2040–2061. http://www.jos.org.cn/1000-9825/6055.htm
    Li Y, Huang CL, Wang ZF, Yuan L, Wang XC. Survey of software vulnerability mining methods based on machine learning. Ruan Jian Xue Bao/Journal of Software, 2020, 31(7): 2040–2061 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/6055.htm
    [17] Fu M, Tantithamthavorn C. LineVul: A Transformer-based line-level vulnerability prediction. In: Proc. of the 19th IEEE/ACM Int’l Conf. on Mining Software Repositories. Pittsburgh: IEEE, 2022. 608–620. [doi: 10.1145/3524842.3528452]
    [18] Liu JY, Ai J, Lu MY, Wang J, Shi HX. Semantic feature learning for software defect prediction from source code and external knowledge. Journal of Systems and Software, 2023, 204: 111753.
    [19] Lu GL, Ju XL, Chen X, Pei WL, Cai ZL. GRACE: Empowering LLM-based software vulnerability detection with graph structure and in-context learning. Journal of Systems and Software, 2024, 212: 112031.
    [20] Chen TY, Li L, Zhu LC, Li ZY, Liu XQ, Liang GT, Wang QX, Xie T. VulLibGen: Generating names of vulnerability-affected packages via a large language model. In: Proc. of the 62nd Annual Meeting of the Association for Computational Linguistics. Bangkok: ACL, 2024. 9767–6780. [doi: 10.18653/v1/2024.acl-long.527]
    [21] Li HN, Hao Y, Zhai YZ, Qian ZY. Enhancing static analysis for practical bug detection: An LLM-integrated approach. Proc. of the ACM on Programming Languages, 2024, 8: 111. [doi: 10.1145/3649828]
    [22] Sun YQ, Wu DY, Xue Y, Liu H, Wang HJ, Xu ZZ, Xie XF, Liu Y. GPTScan: Detecting logic vulnerabilities in smart contracts by combining GPT with program analysis. In: Proc. of the 46th IEEE/ACM Int’l Conf. on Software Engineering. Lisbon: ACM, 2024. 166. [doi: 10.1145/3597503.3639117]
    [23] Yu JX, Liang P, Fu YJ, Tahir A, Shahin M, Wang C, Cai YX. An insight into security code review with LLMs: Capabilities, obstacles and influential factors. arXiv:2401.16310, 2024.
    [24] Sun YQ, Wu DY, Xue Y, Liu H, Ma W, Zhang LY, Liu Y, Li YJ. LLM4Vuln: A unified evaluation framework for decoupling and enhancing LLMs’ vulnerability reasoning. arXiv:2401.16185, 2024.
    [25] Liu YP, Wang J, Wei LL, Xu C, Cheung SC, Wu TY, Yan J, Zhang J. DroidLeaks: A comprehensive database of resource leaks in Android APPs. Empirical Software Engineering, 2019, 24(6): 3435–3483.
    [26] TableInputFormat/TableRecordReaderImpl leaks HTable. 2014. https://github.com/apache/hbase/commit/e04009c9894b1ace20759c5f97f30126f3129aa3
    [27] HBase. TableInputFormat/TableRecordReaderImpl leaks HTable. 2014. https://issues.apache.org/jira/browse/HBASE-10330
    [28] 陈翔, 顾庆, 刘望舒, 刘树龙, 倪超. 静态软件缺陷预测方法研究. 软件学报, 2016, 27(1): 1–25. http://www.jos.org.cn/1000-9825/4923.htm
    Chen X, Gu Q, Liu WS, Liu SL, Ni C. Survey of static software defect prediction. Ruan Jian Xue Bao/Journal of Software, 2016, 27(1): 1–25 (in Chinese with English abstract). http://www.jos.org.cn/1000-9825/4923.htm
    [29] Younis A, Malaiya Y, Anderson C, Ray I. To fear or not to fear that is the question: Code characteristics of a vulnerable function with an existing exploit. In: Proc. of the 6th ACM Conf. on Data and Application Security and Privacy. New Orleans: ACM, 2016. 97–104. [doi: 10.1145/2857705.2857750]
    [30] Shin Y, Williams L. Is complexity really the enemy of software security? In: Proc. of the 4th ACM Workshop on Quality of Protection. Alexandria: ACM, 2008. 47–50. [doi: 10.1145/1456362.1456372]
    [31] Li RH, Feng C, Zhang X, Tang CJ. A lightweight assisted vulnerability discovery method using deep neural networks. IEEE Access, 2019, 7: 80079–80092.
    [32] JavaParser. https://javaparser.org/
    [33] ANTLR. https://www.antlr.org/
    [34] Anbiya DR, Purwarianti A, Asnar Y. Vulnerability detection in PHP Web application using lexical analysis approach with machine learning. In: Proc. of the 5th Int’l Conf. on Data and Software Engineering. Mataram: IEEE, 2018. 1–6. [doi: 10.1109/ICODSE.2018.8705809]
    [35] Zhang J, Wang X, Zhang HY, Sun HL, Wang KX, Liu XD. A novel neural source code representation based on abstract syntax tree. In: Proc. of the 41st IEEE/ACM Int’l Conf. on Software Engineering. Montreal: IEEE, 2019. 783–794. [doi: 10.1109/ICSE.2019.00086]
    [36] Yamaguchi F, Golde N, Arp D, Rieck K. Modeling and discovering vulnerabilities with code property graphs. In: Proc. of the 2014 IEEE Symp. on Security and Privacy. Berkeley: IEEE, 2014. 590–604. [doi: 10.1109/SP.2014.44]
    [37] Wang HT, Ye GX, Tang ZY, Tan SH, Huang SF, Fang DY, Feng YS, Bian LZ, Wang Z. Combining graph-based learning with automated data collection for code vulnerability detection. IEEE Trans. on Information Forensics and Security, 2021, 16: 1943–1958.
    [38] Cheng X, Wang HY, Hua JY, Xu GA, Sui YL. DeepWukong: Statically detecting software vulnerabilities using deep graph neural network. ACM Trans. on Software Engineering and Methodology (TOSEM), 2021, 30(3): 38.
    [39] Han K, Xiao A, Wu EH, Guo JY, Xu CJ, Wang YH. Transformer in Transformer. In: Proc. of the 35th Int’l Conf. on Neural Information Processing Systems. Curran Associates Inc., 2021. 1217.
    [40] Ding YRB, Fu YJ, Ibrahim O, Sitawarin C, Chen XY, Alomair B, Wagner D, Ray B, Chen YZ. Vulnerability detection with code language models: How far are we? arXiv:2403.18624, 2024.
    [41] Chen XP, Hu X, Huang Y, Jiang H, Ji WX, Jiang YJ, Jiang YY, Liu B, Liu H, Li XC, Lian XL, Meng GZ, Peng GZ, Peng X, Sun HL, Shi L, Wang B, Wang C, Wang JY, Wang TT, Xuan JF, Xia X, Yang YB, Yang YX, Zhang L, Zhou YM, Zhang L. Deep learning-based software engineering: Progress, challenges, and opportunities. SCIENCE CHINA Information Sciences, 2025, 68(1): 111102.
    [42] LangChain. Announcing LangSmith, a unified platform for debugging, testing, evaluating, and monitoring your LLM applications. 2023. https://blog.langchain.dev/announcing-langsmith/
    [43] Khare A, Dutta S, Li ZY, Solko-Breslin A, Alur R, Naik M. Understanding the effectiveness of large language models in detecting security vulnerabilities. arXiv:2311.16169, 2023.
    [44] GPT-3.5 Turbo fine-tuning and API updates. 2023. https://openai.com/index/gpt-3-5-turbo-fine-tuning-and-api-updates/
    [45] GPT-4 is OpenAI’s most advanced system, producing safer and more useful responses. 2024. https://openai.com/index/gpt-4/
    [46] Gemini Pro. 2024. https://deepmind.google/technologies/gemini/pro/
    Cited by
    Comments
    Comments
    分享到微博
    Submit
Get Citation

刘天阳,叶嘉威,计卫星,刘辉. Java程序资源泄露缺陷检测: 传统模型和语言模型的有效性分析.软件学报,2025,36(6):2432-2452

Copy
Related Videos
Share
Article Metrics
  • Abstract:140
  • PDF: 219
  • HTML: 0
  • Cited by: 0
History
  • Received:August 26,2024
  • Revised:October 14,2024
  • Online: December 10,2024
You are the first2044213Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063