• Volume 36,Issue 6,2025 Table of Contents
    Select All
    Display Type: |
    • Survey on Fuzzing Based on Large Language Model

      2025, 36(6):0-0. DOI: 10.13328/j.cnki.jos.007323

      Abstract (74) HTML (0) PDF 1.32 M (44) Comment (0) Favorites

      Abstract:Fuzzing is an automated software testing method that detects potential security vulnerabilities, software defects, or abnormal behavior by inputting a large amount of automatically generated test data into the target software system. However, traditional fuzzing techniques are limited by factors such as low automation, low testing efficiency, and low code coverage, and cannot cope with modern large-scale software systems. In recent years, the rapid development of large language models has not only brought significant breakthroughs to the field of natural language processing, but also brought new automation solutions to the field of fuzzy testing. Therefore, in order to better improve the effectiveness of fuzzing technology, existing work has proposed multiple fuzzing methods that combine large language models, covering modules such as test input generation, defect detection, and post-fuzzing. However, the existing work lacks systematic research and discussion on fuzzing techniques based on large language models. In order to fill the gaps mentioned above, this article comprehensively analyzes and summarizes the current research and development status of fuzzing techniques based on large language models. The main contents include: (1) summarizing the overall process of fuzzing and the relevant technologies related to large language models commonly used in fuzzing research; (2) discussing the limitations of deep learning based fuzzing methods before the era of LLM; (3) analyzing the application methods of large language models in different stages of fuzzing; (4) exploring the main challenges and possible future development directions of large language model technology in fuzzing.

    • Insights and Analysis of Open Source License Violation Risks in Large Language Models Generated Code

      2025, 36(6):0-0. DOI: 10.13328/j.cnki.jos.007324

      Abstract (43) HTML (0) PDF 3.32 M (30) Comment (0) Favorites

      Abstract:The rapid development of large language models (LLMs) has significantly impacted the field of software engineering. These LLMs, pre-trained on extensive open-source code datasets, can efficiently perform tasks such as code generation and completion. However, the presence of numerous licensed codes within these datasets poses a license violation risk for the LLMs. This paper focuses on the risk of license violations between code generated by LLMs and open-source repositories. Based on code clone technology, we developed a detection framework that supports tracing the source of code generated by LLMs and identifying copyright infringement issues. Using this framework, we traced and detected open-source license compatibility in the open-source community for 135,000 Python code samples generated by 9 mainstream code LLMs. Through practical investigation of three research questions: "To what extent is the code generated by large models cloned from open-source software repositories?", "Is there a risk of open-source license violations in the code generated by large models?", and "Is there a risk of open-source license violations in the large model-generated code included in real open-source software?", we explore the impact of large model code generation on the open-source software ecosystem. The experimental results show that among the 43,130 and 65,900 python codes longer than six lines generated using functional descriptions and method signatures by nine LLMs, 68.5% and 60.9% of the codes could be found in open-source codes with code clones. CodeParrot and CodeGen had the highest code clone rates, while GPT-3.5-Turbo had the lowest. Besides, 92.7% of the code generated based on function descriptions lacked license declaration. Comparing these with the licenses of the open-source codes, 81.8% of the codes had potential license violation risks. Furthermore, among 229 codes generated by LLMs collected from GitHub, 136 codes were cloned from open-source codes, with 38 classified as Type1 and Type2, and 30 having potential license violation risks. We reported these issues to the developers. So far, we have received feedback from eight developers.

    • Exploration and Improvement of Capabilities of LLMs in Code Refinement Task

      2025, 36(6):0-0. DOI: 10.13328/j.cnki.jos.007325

      Abstract (47) HTML (0) PDF 2.34 M (55) Comment (0) Favorites

      Abstract:As a crucial subtask of automated code review, code refinement plays a significant role in improving efficiency and code quality. With large language models (LLMs) demonstrating superior performance over small pretrained models in software engineering, this study aims to explore the performance of these two types of models in automated code refinement task to evaluate the comprehensive advantages of LLMs. Traditional code quality metrics (e.g., BLEU, CodeBLEU, Edit Progress) are used to evaluate the performance of four mainstream LLMs and four representative small pretrained models in automated code review. Findings indicate that LLMs underperform small pretrained models in code refinement before review (CRB) subtask. Given the limitations of existing code quality metrics in explaining this phenomenon, this study proposes Unidiff-based code refinement metrics to quantify changes during the refinement process. These new metrics elucidate the reasons for the observed disadvantage and reveal the models' tendencies in executing changes: (1) CRB task is highly challenging, with models exhibiting extremely low accuracy in executing correct changes. LLMs, compared to small pretrained models, exhibit more "aggressive" behavior, tending to execute more code changes, resulting in poor performance; (2) Compared to small pretrained models, LLMs are inclined to perform more ADD and MODIFY change operations, with ADD operations typically involving more lines of insertion on average, further demonstrating their "aggressive" nature. To mitigate the disadvantages of LLMs in CRB task, this study introuces LLM-Voter, which is based on large language models and ensemble learning. This method includes two sub-schemes: Inference-based and Confidence-based, aimed at integrating the strengths of different base models to enhance code quality. Furthermore, an refinment determination mechanism is introduced to improve the decision stability and reliability of the model. Experimental results demonstrate that the Confidence-based LLM-Voter method significantly increases the EM (Exact Match) score while achieving refinement quality superior to all base models, thereby effectively alleviating the disadvantages of LLMs.

    • Multi-Agent Collaboration Code Reviewer Recommendation Based on Large Language Models

      2025, 36(6):0-0. DOI: 10.13328/j.cnki.jos.007326

      Abstract (48) HTML (0) PDF 1.95 M (32) Comment (0) Favorites

      Abstract:The pull request (PR)-based software development mechanism is a crucial practice in open-source software development. Effective code reviewers play a vital role in helping contributors identify potential errors in PRs through code reviews, thereby ensuring quality assurance for the continuous development and integration process. However, the complexity of code changes and the inherent diversity of review behaviors significantly increase the difficulty of recommending appropriate reviewers. Existing methods primarily focus on extracting semantic information from PRs or constructing reviewer profiles based on review history, subsequently recommending reviewers through various static strategy combinations. These approaches are limited by the richness of model training corpora and the complexity of interaction types, resulting in suboptimal recommendation performance. In response to these limitations, this paper proposes a novel code reviewer recommendation method based on inter-agent collaboration. This method leverages advanced large language models to accurately capture the rich textual semantics of PRs and reviewers. Furthermore, the robust planning, collaboration, and decision-making capabilities of AI agents enable the integration of diverse interaction types of information, providing high flexibility and adaptability. Experimental analysis based on real-world datasets demonstrates that the proposed method outperforms baseline reviewer recommendation approaches, with performance improvements ranging from 4.45% to 26.04%. Additionally, case studies highlight the exceptional interpretability of the proposed method, further validating its effectiveness and reliability in practical applications.

    • Detection of Resource Leaks in Java Programs: Effectiveness Analysis of Traditional Models and Language Models

      2025, 36(6):0-0. DOI: 10.13328/j.cnki.jos.007327

      Abstract (34) HTML (0) PDF 3.50 M (20) Comment (0) Favorites

      Abstract:Resource leaks are software defects caused by improper closing of limited system resources, and they are prevalent in program software written in various languages and somewhat concealed. Traditional defect detection approaches rely on rules and heuristics. In recent years, deep learning approaches have used various code representations and technologies, such as RNN and GNN, to understand code semantics. Language models (LMs) have demonstrated significant advancements in code understanding and generation in recent research, yet their effectiveness in resource leak detection remains underexplored. This study evaluates traditional model-based, small language model-based (SLM-based) and large language model-based (LLM-based) methods for resource leak detection and investigates enhancements through few-shot learning, fine-tuning, and integration of static analysis with LLMs. Using the JLeaks and DroidLeaks datasets, we evaluate model performance from multiple perspectives, including the root causes of leaks, resource types, and code complexity. Our findings indicate that fine-tuning can significantly improve LM’s performance. However, most models need further improvement in detecting resource leaks from third-party libraries. Furthermore, code complexity has a greater impact on traditional model-based detection methods for resource leak detection.

    • Survey of Intelligent Chip Design Program Testing

      2025, 36(6):0-0. DOI: 10.13328/j.cnki.jos.007328

      Abstract (33) HTML (0) PDF 916.50 K (19) Comment (0) Favorites

      Abstract:In the context of today's intelligent era, chips, as the core components of smart electronic devices, play a crucial role in various fields such as Artificial Intelligence, the Internet of Things, and 5G communication. Ensuring the correctness, security, and reliability of chips is of utmost importance. In the chip development process, developers first use hardware description languages to implement the chip design in software form (i.e., chip design programs), followed by physical design and finally tape-out (i.e., production and manufacturing). As the foundation of chip design and manufacturing, the quality of the chip design program directly affects the quality of the chip. Therefore, testing the chip design program is of significant research value. Early chip design program testing methods primarily relied on manually designed test cases by developers to test the chip design program, which often required substantial manual effort and time. With the increasing complexity of chip design programs, various simulation-based automated testing methods have been proposed, improving the efficiency and effectiveness of chip design program testing. In recent years, more and more researchers have been dedicated to applying intelligent methods such as machine learning, deep learning, and large language models (LLMs) to the field of chip design program testing. This paper surveys 88 academic papers related to intelligent testing of chip design programs, summarizing and categorizing the existing achievements from three perspectives: test input generation, test oracle construction, and test execution optimization. It focuses on the evolution of chip design program testing methods from the machine learning stage to the deep learning stage and then to the large language model stage, exploring the potential of different stages' methods in improving testing efficiency and coverage, as well as reducing testing costs. Additionally, it introduces research datasets and tools in the field of chip design program testing and envisions future development directions and challenges.

    • Large Language Model-Based Decomposition of Long Methods

      2025, 36(6):0-0. DOI: 10.13328/j.cnki.jos.007329

      Abstract (34) HTML (0) PDF 827.18 K (30) Comment (0) Favorites

      Abstract:Long methods, as well as other categories of code smells, are preventing software applications from reaching their maximal readability, reusability, and maintainability. Consequently, automated detection and decomposition of long methods have been extensively studied. Although such approaches have significantly facilitated the decomposition, their solutions are often substantially different from the optimal ones. To this end, in this paper, we investigated the automatable portion of a publicly available dataset containing real-world long methods. Based on the findings from this investigation, we propose a method called Lsplitter, which utilizes large language models to automatically decompose long methods. For a given long method, Lsplitter employs heuristic rules and large language models to decompose the method into a series of shorter methods. However, large language models often result in the decomposition of similar methods. To address this, Lsplitter uses a location-based algorithm to merge physically contiguous and highly similar methods into a longer method. Finally, it ranks these candidate results. We conducted experiments on 2849 long methods from real-world Java projects. The experimental results show that Lsplitter improves the hit rate by 142% compared to traditional methods combined with modularity matrix, and by 7.6% compared to methods purely based on large language models.

    • LLM-powered Datalog Code Translation and Incremental Program Analysis Framework

      2025, 36(6):0-0. DOI: 10.13328/j.cnki.jos.007330

      Abstract (28) HTML (0) PDF 1018.04 K (13) Comment (0) Favorites

      Abstract:Datalog, a declarative logic programming language, has been widely-adopted across diverse domains and experienced a surge in interest from both academia and industry in recent years. This renewed attention has led to the design and development of various Datalog engines alongside their respective dialects. However, a prevalent challenge is that code implemented in one Datalog dialect typically cannot be executed on the engine of another dialect. This limitation necessitates the translation of existing Datalog codebases when transitioning to a new Datalog engine. Traditional approaches to Datalog code translation, which include manual code rewriting and the creation of translation rules, are often time-consuming, repetitive, inflexible, and not easily scalable. This paper proposes an LLM-powered Datalog code translation technology, utilizing the powerful code understanding and generation capabilities of LLM, through a divide-and-conquer strategy, prompt engineering based on few-shot and CoT prompts, and an iterative error-correction mechanism based on check-feedback-repair, which can achieve high-precision code translation between different Datalog dialects and reduce the workload of developers in developing translation rules repeatedly. Building on this code translation technology, a general declarative incremental program analysis framework based on Datalog has been designed and implemented. The performance of the proposed LLM-powered Datalog code translation technology was evaluated on different Datalog dialects, and the evaluation results verified the effectiveness of the proposed code translation technology. We also conducted experimental evaluation on the general declarative incremental program analysis framework, verifying the speedup effect of incremental program analysis based on the proposed code translation technology.

Current Issue


Volume , No.

Table of Contents

Archive

Volume

Issue

联系方式
  • 《Journal of Software 》
  • 主办单位:Institute of Software, CAS, China
  • 邮编:100190
  • 电话:010-62562563
  • 电子邮箱:jos@iscas.ac.cn
  • 网址:https://www.jos.org.cn
  • 刊号:ISSN 1000-9825
  •           CN 11-2560/TP
  • 国内定价:70元
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063