• Volume 36,Issue 7,2025 Table of Contents
    Select All
    Display Type: |
    • Efficient Differential Privacy Random Forest Training Algorithm

      2025, 36(7):0-0. DOI: 10.13328/j.cnki.jos.007332

      Abstract (38) HTML (0) PDF 1.32 M (18) Comment (0) Favorites

      Abstract:Differential privacy, with its powerful privacy protection ability, has been applied in random forest algorithms to solve privacy leakage problem. However, directly applying differential privacy to random forest will seriously reduce the classification accuracy of the model. Therefore, in order to alleviate the contradiction between privacy protection and model accuracy, this paper proposes a novel differential privacy random forest training algorithm, called eDPRF. Specifically, we design a decision tree construction method based on permute-and-flip mechanism, which utilizes the efficient query output advantage of this mechanism to design corresponding utility functions to achieve precise output of split features and labels. At the same time, we design a privacy budget allocation strategy based on composition theorem, which improves the privacy budget utilization rate of nodes by obtaining training subsets without replacement sampling and adjusting internal budgets through differentiation. Finally, privacy analysis and experimental results show that proposed algorithm outperforms similar algorithms in terms of classification accuracy given the same privacy budget.

    • Semantic Aware Greybox Compiler Fuzzing

      2025, 36(7):0-0. DOI: 10.13328/j.cnki.jos.007333

      Abstract (34) HTML (0) PDF 925.64 K (17) Comment (0) Favorites

      Abstract:Fuzz testing techniques play a significant role in software quality assurance and software security testing. However, when dealing with systems like compilers, which have complex input semantics, existing fuzz testing tools often struggle due to a lack of semantic awareness in their mutation strategies, resulting in generated programs that fail compiler frontend checks. This paper proposes a semantically-aware greybox fuzz testing method aimed at enhancing the efficiency of fuzz testing tools in the domain of compiler testing. We designed and implemented a series of mutation operators that maintain input semantic validity and explore contextual diversity, and developed efficient selection strategies tailored to these operators. By integrating these strategies with traditional greybox fuzz testing tools, we developed the greybox fuzz testing tool SemaAFL. Experimental results indicate that with the application of these mutation operators, SemaAFL achieved approximately 14.5% and 11.2% higher code coverage on GCC and Clang compilers compared to AFL++ and similar tools like GrayC. During a week-long experimental period, SemaAFL discovered and reported six previously unknown bugs in GCC and Clang.

    • Survey of Dynamic Testing Methods for Distributed Systems

      2025, 36(7):0-0. DOI: 10.13328/j.cnki.jos.007334

      Abstract (27) HTML (0) PDF 1.66 M (12) Comment (0) Favorites

      Abstract:Distributed systems underpin modern computing, enabling powerful, reliable, and flexible operations across domains such as cloud computing, big data, and IoT. However, their complexity often leads to code defects that threaten usability, robustness, and security, making testing and defect detection essential. Dynamic testing, which evaluates systems during runtime, plays a key role in uncovering defects and assessing functionality. This paper introduces a four-layer bug threat model for distributed systems, covering system configuration, user requests, node communication, and environmental faults. Based on this model, it analyzes the challenges of testing distributed systems and proposes a general framework for dynamic testing. The paper highlights critical techniques such as multidimensional test input generation, system-critical state awareness, and defect judgment criteria. Additionally, the paper reviews popular dynamic testing tools and evaluates their effectiveness in defect discovery and test coverage. The findings show that multidimensional input generation significantly enhances testing efficiency. Finally, the paper discusses emerging trends and future directions in dynamic testing of distributed systems, aiming to address their inherent challenges and improve testing outcomes.

    • Binary2Source Function Similarity Detection under Function Inlining

      2025, 36(7):0-0. DOI: 10.13328/j.cnki.jos.007335

      Abstract (25) HTML (0) PDF 1.25 M (9) Comment (0) Favorites

      Abstract:Binary2source function similarity detection is a fundamental task in software composition analysis. The 1-to-1 matching mechanism has been applied in existing binary2source matching works, where one binary function is matched against one source function. However, we discovered that such a mapping could be 1-to-n (one binary function maps to multiple source functions), due to the existence of function inlining. This mismatch causes existing binary2source matching methods to suffer a 30% performance loss under function inlining. To help conduct binary2source function matching under function inlining, we propose a method named O2NMatcher to generate Source Function Sets as the matching target for binary functions with inlining. We conducted several experiments to evaluate O2NMatcher and the results show that O2NMatcher not only enhances current binary2source function similarity detection techniques, but also enables the identification of inlined source code functions, assisting existing tools in better handling the challenges posed by inlining.

    • Toward Understanding the Current Status and Evolution of Deep Learning Compiler Bugs

      2025, 36(7):0-0. DOI: 10.13328/j.cnki.jos.007336

      Abstract (28) HTML (0) PDF 916.64 K (16) Comment (0) Favorites

      Abstract:Deep Learning compilers (DL compilers) have been widely used in model optimization and deployment. Like traditional compilers, DL compilers also contain bugs. Buggy DL compilers could lead to compilation failure, output incorrect compilation results, and even bring catastrophic consequences. To investigate the characteristics of DL compiler bugs, existing work has studied and analyzed 603 early DL compiler bugs. In recent years, DL compilers have been updated frequently, accompanied by the introduction of many new features and the deprecation of some old features. At the same time, several testing approaches for DL compilers have been proposed. It is unknown whether the previous research findings on DL compiler bugs are still applicable. In addition, there is a lack of in-depth exploration of the relationship among symptoms, root causes, and locations of bugs, and the characteristics of regression test cases that trigger bugs and patches that fix bugs have not been studied. To deeply understand the evolution of current DL compiler bug characteristics and distribution over time, this paper collects 613 recently fixed bugs in three popular DL compilers and labels the root causes, symptoms, and locations for each bug. Then, based on the labeled results, this paper deeply explores the distribution characteristics of bugs from multiple angles and compares them with those in existing research work. At the same time, we also study the characteristics of the patches involved in fixing the bugs and the regression test cases that trigger the bugs. In total, we summarized 12 major findings to fully understand DL compiler bugs and their evolution, and provide a series of feasible suggestions for detecting, localizing, and repairing DL compiler bugs. Finally, to evaluate the usefulness of our findings, we developed a proof-of-concept TVM testing tool, called CfgFuzz, based on optimization configuration. CfgFuzz performs combinatorial testing on compilation configuration and detects 8 TVM bugs, 7 of which have been confirmed or fixed by developers.

    • Empirical Study of Code Smell Detection on Active Learning

      2025, 36(7):0-0. DOI: 10.13328/j.cnki.jos.007337

      Abstract (25) HTML (0) PDF 3.76 M (14) Comment (0) Favorites

      Abstract:The detection of code smells using machine learning and deep learning approaches relies heavily on extensive annotated datasets. However, such annotated datasets are scarce in the field of code smells, and there is a prevalence of unannotated data. Consequently, active learning methods can be applied to the detection of code smells. Previous research has demonstrated that in the field of software engineering, active learning can yield models with superior performance while requiring less annotation and training costs. Nonetheless, the specific impact of active learning on the performance of code smell detection models remains unclear. Applying active learning strategies that are effective in other domains to code smell detection tasks without adaptation may lead to adverse effects. This paper aims to evaluate the impact of active learning on the performance of code smell detection models. To this end, an extensive analysis was conducted on the code smell dataset MLCQ, involving 11 implementations of 5 query strategies, 8 classifiers, and 10 different query ratios to explore their specific impacts on model performance. The results indicate: (1) Among the 11 query strategies involved in this study, those based on uncertainty and committee-based strategies performed better than others, with margin querying (based on uncertainty) and vote entropy querying (based on committee) being particularly notable. (2) Among the 8 classifiers explored, the Random Forest classifier exhibited the best overall performance. (3) Regarding the active learning query ratios, model performance improved significantly as the query ratio increased from 0% to 25%. However, as the query ratio continued to increase from 25% to 50%, the enhancement in model performance slowed and could potentially decline.

    • Empirical Study and Unified Detection Technique of Dependency Smells in Java Projects

      2025, 36(7):0-0. DOI: 10.13328/j.cnki.jos.007338

      Abstract (25) HTML (0) PDF 2.55 M (7) Comment (0) Favorites

      Abstract:Java has emerged as a leading programming language for contemporary application development, attributed to its extensive ecosystem of dependencies and user-friendly build tools such as Maven and Gradle. However, the burgeoning scale of dependencies has led to increased complexity in managing them within Java projects, often surpassing the capabilities of current tools. This complexity can lead to unforeseen issues that significantly hinder the project's builds and runtime, manifesting as build failures, crashes, semantic errors, and other adverse outcomes. This paper aims to address the gaps in the analysis of dependency management issues found in existing research and technical literature by introducing the concept of “Dependency Smell”, with the goal of establishing a unified model for these challenges. We conduct a comprehensive empirical study on dependency management issues, covering all categories of Maven and Gradle related problems. This study analyzes diverse dependency management issues gathered from open-source communities (e.g., GitHub), official documentation (e.g., Maven manual), as well as various surveys and technical papers. Ultimately, we categorize 13 subcategories of dependency smells, elucidating their triggering factors and impact characteristics. Leveraging these empirical findings, we devise a unified detection algorithm for dependency smells in Java projects and develop a specialized detection tool, JDepAna, which seamlessly integrates with Maven and Gradle build tools. Experimental results demonstrate that JDepAna achieves a detection recall rate of 95.9% for known dependency smells. Across more than a hundred new Java projects, JDepAna identifies 30,689 instances of dependency smells, with 360 instances selected for manual verification, resulting in a precision rate of 96.1%. Additionally, we report 48 instances to developers, with 42 instances promptly confirmed and 21 promptly fixed, thereby validating the e?icacy and practicality of our Java dependency smell detection algorithm and tool in facilitating quality assurance for Java projects.

    • Scalable Secure Iris Recognition Combining Feature Generation and Replay

      2025, 36(7):0-0. DOI: 10.13328/j.cnki.jos.007339

      Abstract (22) HTML (0) PDF 1.59 M (17) Comment (0) Favorites

      Abstract:With the rapid development of information technology, security authentication technology has become a crucial safeguard for personal privacy and data security. Among them, iris recognition technology, with its outstanding accuracy and stability, is widely used in system access control, healthcare, and judicial practices. However, the leakage of a user's iris feature data results in permanent loss, as it cannot be changed or revoked. Therefore, the privacy protection of iris feature data is particularly important. With the prominent performance of neural network technology in image processing, secure iris recognition schemes based on neural networks have been proposed, maintaining the high performance of recognition systems while protecting privacy data. However, in the face of constantly changing data and environments, secure iris recognition schemes need to have effective scalability, meaning that the recognition scheme should maintain performance under new user registration. Most existing neural network-based secure iris recognition research does not consider the scalability of the scheme. To address the above issues, this paper proposes the Generative Feature Replay-based Secure Incremental Iris Recognition (GFR-SIR) method and the Privacy-preserving Template Replay-based Secure Incremental Iris Recognition (PTR-SIR) method. Specifically, the GFR-SIR method uses generative feature replay and feature distillation techniques to alleviate the forgetting of previous task knowledge during the expansion of neural networks and adopts the improved TNCB method to protect the privacy of iris feature data. The PTR-SIR method saves the privacy-preserving templates obtained through the TNCB method in previous tasks and replays these templates during the model training of the current task to achieve scalability of the recognition scheme. Experimental results show that after completing 5 rounds of expansion tasks, the recognition accuracy of GFR-SIR and PTR-SIR on the CASIA-IrisV4-Lamp dataset reached 68.32% and 98.49%, respectively, which is an improvement of 58.49% and 88.66% over the fine-tuning method. The analysis indicates that the GFR-SIR method has significant advantages in terms of security and model training efficiency due to not saving the data of previous tasks, while the PTR-SIR method excels in maintaining recognition performance but is inferior to GFR-SIR in terms of security and efficiency.

    • Dynamic Random Testing Approach for Intelligent Agent Path Planning Algorithms

      2025, 36(7):0-0. DOI: 10.13328/j.cnki.jos.007340

      Abstract (24) HTML (0) PDF 1.99 M (8) Comment (0) Favorites

      Abstract:Path planning algorithms for intelligent agents aim to generate feasible paths for the agent, such that it can safely and efficiently reach the target point from the starting point without colliding with obstacles. Currently, path planning algorithms have been widely applied in various critical cyber-physical systems. Therefore, it is necessary to test the path planning algorithms before putting them into practice to evaluate whether their performance meet the requirements. However, the distribution patterns of threat obstacles in the task space, which serve as inputs to the path planning algorithm, can be various and complex. Moreover, executing each test case using the path planning algorithm incurs inevitable computational resources. To improve the testing efficiency of path planning algorithms, this paper adapts the concept of dynamic random testing into path planning algorithms and proposes the Dynamic Random Testing approach for Path Planning algorithms method (DRT-PP). Specifically, DRT-PP discretizes the task space into sub-regions and introduces a threat generation probability within each sub-region, thereby constructing the testing profile. This test profile is then taken as a strategy to generate test cases. Furthermore, DRT-PP dynamically adjusts the test profile during the testing process to gradually optimize it, and hence enhance the testing efficiency. Experiment results show that, compared to random testing and adaptive random testing, DRT-PP can not only ensure the diversity of the generated test suite, but also generate more test cases revealing potential performance failures.

    • Software Vulnerability Detection Based on Correlation of Structural Features between Functions

      2025, 36(7):0-0. DOI: 10.13328/j.cnki.jos.007341

      Abstract (48) HTML (0) PDF 1.71 M (28) Comment (0) Favorites

      Abstract:Vulnerability detection is a critical technology in software system security. In recent years, deep learning has made significant advances in vulnerability detection due to its exceptionals capability in code feature extraction. However, current deep learning-based approaches focus solely on the independent structural features of code instances, neglecting the structural feature similarities and associations among different vulnerable codes, which limits the performance of vulnerability detection technology. To address this issue, this paper proposes a vulnerability detection method based on the correlation of structural features between functions (CSFF-VD). This method first parses functions into code property graphs and extracts independent structural features within functions using gated graph neural networks. On this basis, it constructs an association network among functions using feature similarity and employs a graph attention network to further extract structural similarity information between functions, thereby enhancing vulnerability detection performance. Experimental results show that CSFF-VD outperforms current deep learning-based vulnerability detection methods on three public vulnerability detection datasets. In addition, based on the extraction of independent features within the function, this paper proves the effectiveness of integrating the correlation information between functions by adding experiments on the inter-function correlation feature extraction method in CSFF-VD.

Current Issue


Volume , No.

Table of Contents

Archive

Volume

Issue

联系方式
  • 《Journal of Software 》
  • 主办单位:Institute of Software, CAS, China
  • 邮编:100190
  • 电话:010-62562563
  • 电子邮箱:jos@iscas.ac.cn
  • 网址:https://www.jos.org.cn
  • 刊号:ISSN 1000-9825
  •           CN 11-2560/TP
  • 国内定价:70元
You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address:4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code:100190
Phone:010-62562563 Fax:010-62562533 Email:jos@iscas.ac.cn
Technical Support:Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063