WANG Chao , XU Wei-Wei , ZHOU Ming-Hui
2024, 35(2):513-531. DOI: 10.13328/j.cnki.jos.006944
Abstract:As an essential mechanism of group collaboration in software development, code comments are widely used by developers to improve the efficiency of specific developing tasks. However, code comments do not directly affect the software operation, and developers often ignore them, which leads to poor quality of code comments and affects development efficiency. Quality issues of code comments hinder code understanding, bring misunderstanding, or even introduce bugs, which receive widespread attention from researchers. This study systematically analyzes the research work of global scholars on quality issues of code comments in recent years by literature review. It also summarizes related studies in three aspects: evaluation dimensions of code comment quality, indicators of code comment quality, and strategies to promote code comment quality and points out shortcomings, challenges, and suggestions for the current research.
LU Ze-Yu , ZHANG Peng , WANG Yang , GUO Zhao-Qiang , YANG Yi-Biao , ZHOU Yu-Ming
2024, 35(2):532-580. DOI: 10.13328/j.cnki.jos.006953
Abstract:The effectiveness of a test suite in defect detection refers to the extent to which the test suite could detect the defects hidden in the software. How to evaluate this performance of a test suite is an important issue. Coverage and mutation score are two of the most important and widely used metrics for test suite effectiveness. To quantify the defect detection capability of a test suite, researchers have devoted a large amount of research effort to this issue and have made significant progress. However, inconsistent conclusions can be observed among the existing studies, and some challenges still call for prompt resolution in the area. This study systematically summarizes the research results achieved by scholars both in China and abroad in the field of the evaluation of test suite effectiveness over the years. To start with, it expounds the problems in the research on the evaluation of test suite effectiveness. Then, it outlines and analyzes the evaluation of test suite effectiveness based on coverage and mutation score and presents the application of the evaluation of test suite effectiveness in test suite optimization. Finally, the study points out the challenges faced by this line of research and suggests the directions of future research.
GAO Kai , HE Hao , XIE Bing , ZHOU Ming-Hui
2024, 35(2):581-603. DOI: 10.13328/j.cnki.jos.006975
Abstract:Open source software has been a key infrastructure of modern society, supporting software development in almost every field. Through various kinds of code reuse such as install dependency, API call, project fork, file copy, and code clone, open source software forms an intricate supply (i.e., dependency) network, which is referred to as an open source software supply chain. On the one hand, software supply chains facilitate software development and have become the foundation of the software industry. On the other hand, risks from upstream software can affect downstream software along the supply chain, leading to the ripple effect in open source software supply chains. Open source software supply chains have attracted more and more attention from both the academia and the industry. To help advance researchers’ knowledge of open source software supply chains, this study provides a definition and research framework of open source software supply chains from a holistic perspective. Then, it conducts a systematic literature review on worldwide research and summarizes the status quo of research from three aspects: structure and evolution, risk propagation and management, and dependency management. Finally, the study summarizes the challenges and opportunities of future research on open source software supply chains.
YANG Ze-Zhou , CHEN Si-Rong , GAO Cui-Yun , LI Zhen-Hao , LI Ge , LYU Michael Rung-Tsong
2024, 35(2):604-628. DOI: 10.13328/j.cnki.jos.006981
Abstract:This study focuses on Code Generation task that aims at generating relevant code fragments according to given natural language descriptions. In the process of software development, developers often encounter two scenarios. One is writing a large amount of repetitive and low-technical code for implementing common functionalities. The other is writing code that depends on specific task requirements, which may necessitate external resources such as documentation or other tools. Therefore, code generation has received a lot of attention among academia and industry for assisting developers in coding. It has also been one of the key concerns in the field of software engineering to make machines understand users’ requirements and write programs on their own. The recent development of deep learning techniques, especially pre-training models, makes the code generation task achieve promising performance. In this study, the current work on deep learning-based code generation is systematically reviewed and the current deep learning-based code generation methods are classified into three categories: methods based on code features, methods incorporated with retrieval, and methods incorporated with post-processing. The first category refers to the methods that use deep learning algorithms for code generation based on code features, and the second and third categories improve the performance of the methods in the first category. The existing research results of each category of methods are systematically reviewed, summarized, and commented. Besides, the study analyzes the corpus and the popular evaluation metrics used in the existing code generation work. Finally, it summarizes the overall literature review and provides a prospect for future research directions worthy of attention.
WANG Ying , WU Ying-Xin , GAO Tian , CHEN Zi-Ying , XU Chang , YU Hai , CHEUNG Shing-Chi
2024, 35(2):629-674. DOI: 10.13328/j.cnki.jos.006983
Abstract:Under the new era of “human-machine-thing” ternary integration and ubiquitous computing, the software deployment and operation environment of “open and changeable”, “diverse needs”, and “complex scenarios” have put forward more requirements and higher expectations for the governance of open-source software library ecosystems. To further promote the construction of trusted software supply chain ecosystems and create an independent and controllable technical system based on the ubiquitous computing model, this study focuses on open-source software library ecosystems. It collects 348 authoritative papers in this field in the past two decades (2001–2023) and sorts out the research work of open-source software library management ecological governance technology. The study discusses the modeling and analysis, evolution and maintenance, quality assurance, and management of open-source software supply chain ecosystems, and summarizes the research status, problems, challenges and trends.
LIU Jing , ZHENG Tong-Ya , HAO Qin-Fen
2024, 35(2):675-710. DOI: 10.13328/j.cnki.jos.006933
Abstract:Graph data, such as citation networks, social networks, and transportation networks, exist widely in the real world. Graph neural networks (GNNs) have attracted extensive attention due to their strong expressiveness and excellent performance in a variety of graph analysis applications. However, the excellent performance of GNNs benefits from label data which are difficult to obtain, and complex network models with high computational costs. Knowledge distillation (KD) is introduced into the GNNs to address the labeled data scarcity and high complexity of GNNs. KD is a method of training constructed small models (student models) by soft-label supervision information from larger models (teacher models) to yield better performance and accuracy. Therefore, how to apply the KD technology to graph data has become a research challenge, but there is still a lack of a graph-based KD research review. Aiming at providing a comprehensive overview of KD based on graphs, this study first summarizes the existing studies and fills in the review gap in this field. Specifically, this study first introduces the background knowledge of graph and KD. Then, three types of graph-based knowledge distillation methods are comprehensively summarized, including graph knowledge distillation for deep neural networks (DNNs), graph knowledge distillation for GNNs, and self-KD-based graph knowledge distillation. Furthermore, each type of method is further divided into knowledge distillation methods based on the output layer, the middle layer, and the constructed graph. Subsequently, the design ideas of various graph-based knowledge distillation algorithms are analyzed and compared, and the advantages and disadvantages of the algorithms are concluded with experimental results. In addition, the application of graph-based knowledge distillation in computer vision, natural language processing, recommendation systems, and other fields are also listed. Finally, the development of graph-based knowledge distillation is summarized and prospected. This study also discloses the references related to graph-based knowledge distillation on GitHub. Please refer to https://github.com/liujing1023/Graph-based-Knowledge-Distillation.
LIN Qian , YU Chao , WU Xia-Wei , DONG Yin-Zhao , XU Xin , ZHANG Qiang , GUO Xian
2024, 35(2):711-738. DOI: 10.13328/j.cnki.jos.007006
Abstract:In recent years, reinforcement learning methods based on environmental interactions have achieved great success in robotic applications, providing a practical and feasible solution for optimizing the behavior control strategies of robots. However, collecting interactive samples in the real world can lead to problems such as high cost and low efficiency. Therefore, the simulation environment is widely used in the training process of robot reinforcement learning. By obtaining a large number of training samples at a low cost in the virtual simulation environment for strategy training and transferring learning strategies to the real world, the security, reliability, and real-time problems in the real robot training process can be alleviated. However, due to the difference between the simulation environment and the real environment, it is often difficult to obtain ideal performance when directly transferring the strategy trained in the simulation environment to the real robot. To solve this problem, sim-to-real transfer reinforcement learning methods are proposed to reduce the environmental gap, so as to achieve effective strategy transfer. According to the direction of information flow in the process of transfer reinforcement learning and the different objects targeted by intelligent methods, this survey first proposes a sim-to-real transfer reinforcement learning framework, based on which the existing related work is then divided into three categories: the model optimization methods focusing on the real environment, the knowledge transfer methods focusing on the simulation environment, and the iterative policy promotion methods focusing on both simulation and real environments. Then, the representative technologies and related work in each category are described. Finally, the opportunities and challenges in this field are briefly discussed.
ZHANG Ming-Yue , JIN Zhi , LIU Kun
2024, 35(2):739-757. DOI: 10.13328/j.cnki.jos.006832
Abstract:The mixed cooperative-competitive multi-agent system consists of controlled target agents and uncontrolled external agents. The target agents cooperate with each other and compete with external agents, so as to deal with the dynamic changes in the environment and the external agents and complete tasks. In order to train the target agents and make them learn the optimal policy for completing the tasks, the existing work proposes two kinds of solutions: (1) focusing on the cooperation between target agents, viewing the external agents as a part of the environment, and leveraging the multi-agent-reinforcement learning to train the target agents; but these approaches cannot handle the uncertainty of or dynamic changes in the external agents’ policy; (2) focusing on the competition between target agents and external agents, modeling the competition as two-player games, and using a self-play approach to train the target agents; these approaches are only suitable for cases where there is one target agent and external agent, and they are difficult to be extended to a system consisting of multiple target agents and external agents. This study combines the two kinds of solutions and proposes a counterfactual regret advantage-based self-play approach. Specifically, first, based on the counterfactual regret minimization and counterfactual multi-agent policy gradient, the study designs a counterfactual regret advantage-based policy gradient approach for making the target agent update the policy more accurately. Second, in order to deal with the dynamic changes in the external agents’ policy during the self-play process, the study leverages imitation learning, which takes the external agents’ historical decision-making trajectories as training data and imitates the external agents’ policy, so as to explicitly model the external agents’ behaviors. Third, based on the counterfactual regret advantage-based policy gradient and the modeling of external agents’ behaviors, this study designs a self-play training approach. This approach can obtain the optimal joint policy for training multiple target agents when the external agents’ policy is uncertain or dynamically changing. The study also conducts a set of experiments on the cooperative electromagnetic countermeasure, including three typical mixed cooperative-competitive tasks. The experimental results demonstrate that compared with other approaches, the proposed approach has an improvement of at least 78% in the self-game effect.
XU Fan , XU Jian-Ming , MA Yong , WANG Ming-Wen , ZHOU Guo-Dong
2024, 35(2):758-772. DOI: 10.13328/j.cnki.jos.006823
Abstract:How to reduce secure and repeated replies is a challenging problem in the open-domain multi-turn dialogue model. However, the existing open-domain dialogue models often ignore the guiding role of dialogue objectives and how to introduce and select more accurate knowledge information in dialogue history and dialogue objectives. Based on these phenomena, this study proposes a multi-turn dialogue model based on knowledge enhancement. Firstly, the model replaces the notional words in the dialogue history with semaphores and domain words, so as to eliminate ambiguity and enrich the dialogue text representation. Then, the knowledge-enhanced dialogue history and expanded triplet world knowledge are effectively integrated into the knowledge management and knowledge copy modules, so as to integrate information of knowledge, vocabularies, dialogue history, and dialogue objectives and generate diverse responses. The experimental results and visualization on two international benchmark open-domain Chinese dialogue corpora verify the effectiveness of the proposed model in both automatic evaluation and human judgment.
SI Bing-Ru , XIAO Jiang , LIU Cun-Yang , DAI Xiao-Hai , JIN Hai
2024, 35(2):773-799. DOI: 10.13328/j.cnki.jos.006985
Abstract:Blockchain, as a typical distributed system, its underlying networks highly influences the overall system performance and security. Blockchain networks differ from traditional P2P (peer-to-peer) networks in terms of security models, transmission protocols and performance indicators. This study first systematically analyzes the blockchain network transmission process, i.e., connection establishment and data transmission, and list out the challenging issues. Second, state-of-the-art blockchain topology protocols and data transmission methods are thoroughly investigated and discussed, from the perspective of node heterogeneity, coding scheme, broadcast algorithm and relay network, and etc. Meanwhile, the typical cross-chain network implementation and the network simulation tools are summarized. Finally, we envision the possible future research trends in the realm of blockchain networks.
DUAN Tian-Tian , ZHANG Han-Wen , LI Bo , SONG Zhao-Xiong , LI Zhong-Cheng , ZHANG Jun , SUN Yi
2024, 35(2):800-827. DOI: 10.13328/j.cnki.jos.006950
Abstract:Blockchain is the basis of the Internet of value. However, data and value silos arise from independent blockchain systems. Blockchain interoperability (also known as cross-chain operability) is essential for breaking inter-chain barriers and building a blockchain network. After differentiating between the blockchain interoperability in the narrow sense and that in the broad sense, this study redefines the former concept and abstracts out two primary operations: cross-chain reading and cross-chain writing. Subsequently, it summarizes three key technical problems that need to be resolved for achieving the blockchain interoperability in the narrow sense: cross-chain information transmission, cross-chain trust transfer, and cross-chain operation atomicity guarantee. Then, the study reviews the current research status of the three problems systematically and makes comparisons from multiple perspectives. Furthermore, it analyzes some representative holistic solutions from the perspective of the key technical problems. Finally, several research directions deserving of further exploration are also presented.
CHEN Jing , YANG Hao , HE Kun , LI Kai , JIA Meng , DU Rui-Ying
2024, 35(2):828-851. DOI: 10.13328/j.cnki.jos.006954
Abstract:In recent years, blockchain technology has attracted a lot of attention. As a distributed ledger technology, it has been applied to many fields due to its openness, transparency, and non-tamperability. However, as the number of users and access requirements rise, the performance bottleneck induced by the poor scalability of the existing blockchain architectures has restricted the application and promotion of blockchain technology. How to solve the scalability problem has become a hotspot issue in academia and industry. This study analyzes and summarizes the currently available blockchain scaling solutions. For this purpose, the study outlines the basic concept of blockchain and the origin of the scalability problem, defines the scalability problem, and proposes the metrics for scalability. Then, it presents a classification framework and reports the existing solutions in the manner of categorizing them into three classes: network scaling, on-chain scaling, and off-chain scaling. Different blockchain scalability solutions are analyzed for a comparison of their respective technical characteristics and a summary of their advantages and disadvantages. Finally, this study discusses the open issues that need to be addressed promptly and explores the future trends of blockchain technology.
QIAN Hao , ZHENG Jia-Qi , CHEN Gui-Hai
2024, 35(2):852-871. DOI: 10.13328/j.cnki.jos.006938
Abstract:Network management and monitoring are crucial topics in the network field, with the technologies used to achieve this being referred to as network measurement. In particular, network heavy hitter detection is an important technique of network measurement, and it is analyzed in this study. Heavy hitters are flows that exceed an established threshold in terms of occupied network resources (bandwidth or the number of packets transmitted). Detecting heavy hitters can contribute to quick anomaly detection and more efficient network operation. However, the implementation of heavy hitter detection is impacted by high-speed links. Traditional methods and software defined network (SDN)-based methods are two categories of heavy hitter detection methods that have been developed over time. This study reviews the related frameworks and algorithms, systematically summarizes the development and current status, and finally tries to predict future research directions of network heavy hitter detection.
YU Bo , SU Jin-Shu , YANG Qiang , HUANG Jian-Xin , SHENG Zhou-Shi , LIU Run-Hao , LU Jian-Jun , LIANG Chen , CHEN Chen , ZHAO Lei
2024, 35(2):872-898. DOI: 10.13328/j.cnki.jos.006942
Abstract:The network protocol software is widely deployed and applied, and it provides diversified functions such as communication, transmission, control, and management in cyberspace. In recent years, its security has gradually attracted the attention of academia and industry. Timely finding and repairing network protocol software vulnerabilities has become an important topic. The features, such as diversified deployment methods, complex protocol interaction processes, and functional differences in multiple protocol implementations of the same protocol specification, make the vulnerability mining technique of network protocol software face many challenges. This study first classifies the vulnerability mining technologies of network protocol software and defines the connotation of existing key technologies. Secondly, this study summarizes the technical progress in four aspects of network protocol software vulnerability mining, including network protocol description method, mining object adaptation technology, fuzz testing technology, and vulnerability mining method based on program analysis. In addition, through comparative analysis, the technical advantages and evaluation dimensions of different methods are summarized. Finally, this study summarizes the technical status and challenges of network protocol software vulnerability mining and proposes five potential research directions.
DONG Hao-Wen , ZHANG Chao , LI Guo-Liang , FENG Jian-Hua
2024, 35(2):899-926. DOI: 10.13328/j.cnki.jos.006952
Abstract:The virtualization, high availability, high scheduling elasticity, and other characteristics of cloud infrastructure provide cloud databases with many advantages, such as the out-of-the-box feature, high reliability and availability, and pay-as-you-go model. Cloud databases can be divided into two categories according to the architecture design: cloud-hosted databases and cloud-native databases. Cloud-hosted databases, deploying the database system in the virtual machine environment on the cloud, offer the advantages of low cost, easy operation and maintenance, and high reliability. Besides, cloud-native databases take full advantage of the characteristic elastic scaling of the cloud infrastructure. The disaggregated compute and storage architecture is adopted to achieve the independent scaling of computing and storage resources and further increase the cost-performance ratio of the databases. However, the disaggregated compute and storage architecture poses new challenges to the design of database systems. This survey is an in-depth analysis of the architecture and technology of the cloud-native database system. Specifically, the architectures of cloud-native online transaction processing (OLTP) and online analytical processing (OLAP) databases are classified and analyzed, respectively, according to the difference in the resource disaggregation mode, and the advantages and limitations of each architecture are compared. Then, on the basis of the disaggregated compute and storage architectures, this study explores the key technologies of cloud-native databases in depth by functional modules. The technologies under discussion include those of cloud-native OLTP (data organization, replica consistency, main/standby synchronization, failure recovery, and mixed workload processing) and those of cloud-native OLAP (storage management, query processing, serverless-aware compute, data protection, and machine learning optimization). At last, the study summarizes the technical challenges for existing cloud-native databases and suggests the directions for future research.
LI Chao-Neng , FENG Guan-Wen , YAO Hang , LIU Ru-Yi , LI Yu-Nan , XIE Kun , MIAO Qi-Guang
2024, 35(2):927-974. DOI: 10.13328/j.cnki.jos.006996
Abstract:The rapid advancement of sensor technology has resulted in a vast volume of traffic trajectory data, and trajectory anomaly detection has a wide range of applications in sectors including smart transportation, autonomous driving, and video surveillance. Trajectory anomaly detection, unlike other trajectory mining tasks like classification, clustering, and prediction, tries to find low-probability, uncertain, and unusual trajectory behavior. The types of anomalies, trajectory data labels, detection accuracy, and computational complexity are all frequent issues in trajectory anomaly detection. In view of the above problems, the research status and latest progress of trajectory anomaly detection technology in the past two decades are comprehensively reviewed. First, the characteristics of trajectory anomaly detection and the current research challenges are analyzed. Then, the existing trajectory anomaly detection algorithms are compared and analyzed based on the classification criteria such as the availability of trajectory labels, the principle of anomaly detection algorithms, and the working mode of offline or online algorithms. For each type of anomaly detection technology, the algorithm principle, representative method, complexity analysis and algorithm advantages and disadvantages are summarized and analyzed in detail. Then, the open source trajectory datasets, commonly used anomaly detection evaluation methods and anomaly detection tools are discussed. On this basis, the architecture of the trajectory anomaly detection system is presented, and a series of relatively complete trajectory mining processes from trajectory data collection to anomaly detection application are formed. Finally, the significant open issues in the domain of trajectory anomaly detection are discussed, as well as potential research trends and solutions.
CAI Xu-Dong , WANG Yong-Cai , BAI Xue-Wei , LI De-Ying
2024, 35(2):975-1009. DOI: 10.13328/j.cnki.jos.006946
Abstract:In the fields of autonomous driving, augmented reality, and intelligent mobile robots, visual relocalization is a crucial fundamental issue. It refers to the issue of determining the position and attitude in an existing prior map according to the data captured in real time by visual sensors. In the last decades, visual relocalization has received extensive attention, and numerous kinds of prior map construction methods and visual relocalization methods have come to the fore. These efforts vary considerably and cover a wide scope, but technical overviews and summaries are still unavailable. Therefore, a survey of the field of visual relocalization is valuable both theoretically and practically. This study tries to construct a unified blueprint for visual relocalization methods and summarize related studies from the perspective of image data querying from large-scale map databases. This study surveys various types of construction methods for map databases and different feature matching, relocalization, and pose calculation approaches. It then summarizes the current mainstream datasets for visual relocalization and finally analyzes the challenges ahead and the potential development directions of visual relocalization.
LU Jing-Jing , QIN Yun-Chuan , LIU Zhi-Zhong , TANG Zhuo , ZHANG Yong-Jun , LI Ken-Li
2024, 35(2):1010-1027. DOI: 10.13328/j.cnki.jos.006943
Abstract:Robots are increasingly entering people’s daily life and are receiving more and more attention in China and abroad. One of the important characteristics of robotic systems is security, and enhancing the security of robotic systems can protect robots from malicious attackers. The security of robot operating system (ROS) is an important part of the security of robotic systems. Although researchers have done a lot of research work on the security of ROSs in recent years, unfortunately, security has not received enough attention yet. In order to draw more attention to the security of robotic systems and help people quickly understand the security solutions of the current mainstream ROS, this study systematically investigates and summarizes the security of ROSs. On the one hand, this study analyzes the security features of ROSs and discusses the known security problems in ROSs. On the other hand, this study categorizes and summarizes the research related to the security of ROSs in recent years and compares the security solutions of ROSs in terms of confidentiality, integrity, and availability. Finally, this study prospects the future of security research on ROS.
GAO Lan , ZHAO Yu-Chen , ZHANG Wei-Gong , WANG Jing , QIAN De-Pei
2024, 35(2):1028-1047. DOI: 10.13328/j.cnki.jos.006984
Abstract:Parallel computing has become the mainstream. Among all the parallel computing systems, synchronization is one of the critical designs and is imperative to fully utilize the hardware performance. In recent years, GPU, as the most widely used accelerator, has developed rapidly, and many applications have placed greater demands on GPU thread synchronization. However, current GPUs cannot support thread synchronization efficiently in many real-world applications. Although many approaches have been proposed to support GPU thread synchronization and much progress has been made, the unique architecture and parallel pattern of GPUs still lead to many challenges in GPU thread synchronization research. In this study, thread synchronization in GPU parallel programming is divided into different categories according to different synchronization purposes and granularity. Around the synchronization expression and execution, the key problems and challenges of synchronization on GPUs are firstly analyzed, i.e., being difficult to express efficiently, incurring frequent concurrency bugs, and low execution efficiency. Secondly, the study introduces the research on synchronization for thread contention and synchronization for thread cooperation on GPUs in academia and industry in recent years from two aspects of thread synchronization expression method and performance optimization method based on different GPU thread synchronization granularity. Then the existing research methods are analyzed. On this basis, the study points out the future research trends and development prospects of GPU thread synchronization and feasible research methods, providing a reference for researchers in this field.