GONG Li-Na , ZHOU Yi-Ren , QIAO Yu , JIANG Shu-Juan , WEI Ming-Qiang , HUANG Zhi-Qiu
2025, 36(1):1-26. DOI: 10.13328/j.cnki.jos.007143 CSTR: 32375.14.jos.007143
Abstract:In recent years, deep learning has achieved excellent performance in software engineering (SE) tasks. Excellent performance in practical tasks depends on large-scale training sets, and collecting and labeling large-scale training sets require a lot of resources and costs, which limits the wide application of deep learning techniques in practical tasks. With the release of pre-trained model (PTM) in the field of deep learning, researchers in SE have begun to pay attention to PTM and introduced PTM into SE tasks. PTM has made a qualitative leap in SE tasks, which makes intelligent software engineering enter a new era. However, none of the studies have refined the success, failure, and opportunities of pre-trained models in SE. To clarify the work in this cross-field (pre-trained models for software engineering, PTM4SE), this study systematically reviews the current studies related to PTM4SE. Specifically, the study first describes the framework of the intelligent software engineering methods based on pre-trained models and then analyzes the commonly used pre-trained models in SE. Meanwhile, it introduces the downstream tasks in SE with pre-trained models in detail and compares and analyzes the performance of pre-trained model techniques on these tasks. The study then presents the datasets used in SE for training and fine-tuning the PTMs. Finally, it discusses the challenges and opportunities for PTM4SE. The collated PTMs and datasets in SE are published athttps://github.com/OpenSELab/PTM4SE.
CHEN Xiao-Hong , LIU Shao-Bin , JIN Zhi
2025, 36(1):27-46. DOI: 10.13328/j.cnki.jos.007157 CSTR: 32375.14.jos.007157
Abstract:As embedded systems are widely applied, their requirements are becoming increasingly complex, making requirements analysis a critical stage in embedded system development. How to correctly describe and model requirements has become a primary issue. This study systematically investigates the current requirements descriptions of embedded systems and conducts a comprehensive comparative analysis to deepen the understanding of the core concerns of embedded system requirements. The study first applies the systematic literature review method to identify, retrieve, summarize, and analyze the relevant literature published between January 1979 and November 2023. Through the automatic retrieval and snowball processes, 150 papers closely related to the topic are finally selected for the comprehensiveness of the review. The study analyzes the existing capabilities of embedded requirements description languages from their description concerns, description contents, requirements analysis elements, etc. Finally, it summarizes the challenges to the current requirements descriptions. Moreover, aiming at the task of intelligent synthesis of embedded software, it puts forward the need for the expressive ability of embedded system requirement description languages.
YANG Guang , LIU Jie , QU Mu-Zi , WANG Shuai , YE Dan , ZHONG Hua
2025, 36(1):47-78. DOI: 10.13328/j.cnki.jos.007190 CSTR: 32375.14.jos.007190
Abstract:Serverless computing is an emerging cloud computing model based on the “function as a service (FaaS)” paradigm. Functions serve as the fundamental unit for deployment and scheduling, providing users with massively parallel and automatically scalable function execution services without the need to manage underlying resources. For users, serverless computing helps them alleviate the burden of managing cluster-level infrastructure, enabling them to focus on business-layer development and innovation. For service providers, applications are decomposed into fine-grained functions, leading to significantly improved scheduling efficiency and resource utilization. The significant advantages have swiftly drawn the attention from the industry and propelled serverless computing into popularity. However, the distinct computing mode of serverless computing, divergent from traditional cloud computing, along with its stringent limitations on various aspects of tasks, poses numerous obstacles to application migration. The escalating complexity of migrated tasks also imposes higher performance requirements on serverless computing. Therefore, performance optimization technology for serverless computing systems has emerged as a critical research topic. This study reviews and summarizes research efforts on performance optimization of serverless computing from four perspectives, and introduces existing system. Firstly, this study introduces the optimization technologies for typical tasks, including task adaptation and system optimization for specific task types. Secondly, it reviews the optimization work on sandbox environments, encompassing sandbox solutions and cold start optimization methods, which play a crucial role in the execution of serverless functions. Thirdly, it provides an overview of the optimization in I/O and communication technologies, which are major performance bottlenecks of serverless applications. Lastly, it briefly outlines related resource scheduling technologies, including platform-oriented and user-oriented scheduling strategies, which determine system resource utilization and task execution efficiency. In conclusion, it summarizes the current issues and challenges of performance optimization technologies of serverless computing and anticipates potential future research directions.
ZHANG Xue-Ning , LIU Xing-Bo , SONG Jing-Kuan , NIE Xiu-Shan , WANG Shao-Hua , YIN Yi-Long
2025, 36(1):79-106. DOI: 10.13328/j.cnki.jos.007141 CSTR: 32375.14.jos.007141
Abstract:As image data grows explosively on the Internet and image application fields widen, the demand for large-scale image retrieval is increasing greatly. Hash learning provides significant storage and retrieval efficiency for large-scale image retrieval and has attracted intensive research interest in recent years. Existing surveys on hash learning are confronted with the problems of weak timeliness and unclear technical routes. Specifically, they mainly conclude the hashing methods proposed five to ten years ago, and few of them conclude the relationship between the components of hashing methods. In view of this, this study makes a comprehensive survey on hash learning for large-scale image retrieval by reviewing the hash learning literature published in the past twenty years. First, the technical route of hash learning and the key components of hashing methods are summarized, including loss function, optimization strategy, and out-of-sample extension. Second, hashing methods for image retrieval are classified into two categories: unsupervised hashing methods and supervised ones. For each category of hashing methods, the research status and evolvement process are analyzed. Third, several image benchmarks and evaluation metrics are introduced, and the performance of some representative hashing methods is analyzed through comparative experiments. Finally, the future research directions of hash learning are summarized considering its limitations and new challenges.
DONG Shao-Kang , LI Chao , YANG Guang , GE Zhen-Xing , CAO Hong-Ye , CHEN Wu-Bing , YANG Shang-Dong , CHEN Xing-Guo , LI Wen-Bin , GAO Yang
2025, 36(1):107-151. DOI: 10.13328/j.cnki.jos.007212 CSTR: 32375.14.jos.007212
Abstract:In recent years, there has been rapid advancement in the application of artificial intelligence technology to sequential decision-making and adversarial game scenarios, resulting in significant progress in domains such as Go, games, poker, and Mahjong. Notably, systems like AlphaGo, OpenAI Five, AlphaStar, DeepStack, Libratus, Pluribus, and Suphx have achieved or surpassed human expert-level performance in these areas. While these applications primarily focus on zero-sum games involving two players, two teams, or multiple players, there has been limited substantive progress in addressing mixed-motive games. Unlike zero-sum games, mixed-motive games necessitate comprehensive consideration of individual returns, collective returns, and equilibrium. These games are extensively applied in real-world applications such as public resource allocation, task scheduling, and autonomous driving, making research in this area crucial. This study offers a comprehensive overview of key concepts and relevant research in the field of mixed-motive games, providingan in-depth analysis of current trends and future directions both domestically and internationally. Specifically, this study first introduces the definition and classification of mixed-motive games. It then elaborates on game solution concepts and objectives, including Nash equilibrium, correlated equilibrium, and Pareto optimality, as well as objectives related to maximizing individual and collective gains, while considering fairness. Furthermore, the study engages in a thorough exploration and analysis of game theory methods, reinforcement learning methods, and their combination based on different solution objectives. In addition, the study discusses relevant application scenarios and experimental simulation environments before concluding with a summary and outlook on future research directions.
Lü Xing-Lin , LI Jun-Hui , TAO Shi-Min , YANG Hao , ZHANG Min
2025, 36(1):152-183. DOI: 10.13328/j.cnki.jos.007217 CSTR: 32375.14.jos.007217
Abstract:Machine translation (MT) aims to build an automatic translating system to transform a given sequence in the source language into another target language sequence that shares identical semantic information. MT has been an important research direction in natural language processing and artificial intelligence fields for its widely applied scenarios. In recent years, the performance of neural machine translation (NMT) greatly surpasses that of statistical machine translation (SMT), becoming the mainstream method in MT research. However, NMT generally takes the sentence as the translated unit, and in document-level translation scenarios, some discourse errors such as the mistranslation of words and incoherent sentences may occur due to the separation with discourse context if the sentence is translated independently. Therefore, incorporating document-level information into the procedure of translation may be a more reasonable and natural way to solve discourse errors. This conforms with the goal of document-level neural machine translation (DNMT) and has been a popular direction in MT research. This study reviews and summarizes works in DNMT research in terms of discourse evaluation methods, datasets and models applied, and other aspects to help the researchers efficiently learn the research status and further directions of DNMT. Meanwhile, this study also introduces the prospect and some challenges in DNMT, hoping to bring some inspiration to researchers.
QIAO Xuan , LI Zong-Hui , LIU Qiang , AI Bo , WAN Hai , DENG Yang-Dong
2025, 36(1):184-202. DOI: 10.13328/j.cnki.jos.007148 CSTR: 32375.14.jos.007148
Abstract:The time-sensitive networking standard developed by IEEE 802.1 Task Group can be applied to build highly reliable, low latency, low jitter Ethernet, and the extension of time-sensitive networking to the wireless field is also a hot topic. Compared with traditional wired communication, wireless time-sensitive networking can not only achieve high reliability and low delay communication but also has the advantages of higher flexibility, stronger mobility, and lower wiring and maintenance costs. Therefore, wireless time-sensitive networking is considered a promising technology in the face of emerging applications such as autonomous driving, collaborative robotics, and remote medical control in the future. Generally, wireless networks can be divided into infrastructure-based wireless networks and non-infrastructure-based wireless networks. The latter can be divided into two categories based on mobility: mobile ad hoc networks and wireless sensor networks. Therefore, this paper mainly studies and summarizes the application scenarios, related technologies, routing protocols, and high-reliability and low-delay transmission of the three types of networks.
YANG Xiu-Zhang , PENG Guo-Jun , LIU Si-De , TIAN Yang , LI Chen-Guang , FU Jian-Ming
2025, 36(1):203-252. DOI: 10.13328/j.cnki.jos.007162 CSTR: 32375.14.jos.007162
Abstract:Advanced persistent threat (APT) is a novel form of cyberattack that is well-organized, stealthy, persistent, adversarial, and destructive, resulting in catastrophic consequences for global network security. Traditional APT attack defenses tend to construct models to detect whether the attacks are malicious or identify the malicious family categories, primarily employing a passive defense strategy and lacking comprehensive and in-depth exploration of the field of APT attack attribution and inference. In light of this, this study focuses on the intelligent methods of APT attack attribution and inference to conduct a survey study. Firstly, an overall defense chain framework for APT attacks is proposed, which can effectively distinguish and correlate APT attack detection, attribution, and inference. Secondly, the work related to the four tasks of APT attack detection is reviewed in detail. Thirdly, APT attack attribution research is systematically summarized for regions, organizations, attackers, addresses, and attack models. Then, APT attack inference is divided into four aspects: attack intent inference, attack path perception, attack scenario reconstruction, and attack blocking and countermeasures, and relevant works are summarized and compared in detail. Finally, the hot topics, development trends, and challenges in the field of APT attack defense are discussed.
MEI Han-Tao , CHENG Guang , ZHU Yi-Lin , ZHOU Yu-Yang
2025, 36(1):253-288. DOI: 10.13328/j.cnki.jos.007182 CSTR: 32375.14.jos.007182
Abstract:The growth in the Internet poses privacy challenges, prompting the development of anonymous communication systems like the most widely used Tor (the second-generation onion router). However, the notable anonymity offered by Tor has inadvertently made it a breeding ground for criminal activities, attracting miscreants engaged in illegal trading and cybercrime. One of the most prevalent techniques for de-anonymizing Tor is Tor passive traffic analysis, where in anonymity is compromised by passively observing network traffic. This study aims to delve into the fundamental concepts of Tor and traffic analysis, elucidate application scenarios and threat models, and classify existing works into two categories: traffic identification & classification, and flow correlation. Subsequently, their respective traffic collection methods, feature extraction techniques, and algorithms are compared and analyzed. Finally, the primary challenges faced by current research in this domain are concluded and future research directions are proposed.
SU Jin-Shu , SONG Cong-Xi , JI Xiao-Lan , XU Cao , HAN Biao
2025, 36(1):289-320. DOI: 10.13328/j.cnki.jos.007193 CSTR: 32375.14.jos.007193
Abstract:Multi-path transmission technology establishes multiple transmission paths between communication parties via various network interfaces on devices. In this way, bandwidth aggregation, load balance, and path redundancy will be achieved to increase transmission throughput and reliability. These benefits allow the multipath transmission technology to be widely used in several application scenarios such as servers, terminals, and data centers. As a part and parcel of network architecture and transmission technology studies, the technology is of research significance and value. To this end, this study systematically analyzes the multi-path transmission technology in terms of its concepts and core mechanisms. Firstly, the basic concepts, standardized process and application value of multi-path transmission are outlined. Secondly, the core mechanisms of the multi-path transmission technology are enunciated, including congestion control, packet scheduling, path management, retransmission mechanism, security mechanism, and the mechanism for specialized applications. Classification methods and the main research results of each mechanism are elaborated, and the advantages, disadvantages and the development direction of mechanisms are summarized. Finally, this study probes into challenges faced by multi-path transmission technology research and envisions the prospect for relevant studies.
LIN Jin-Lei , LI Cheng-Long , SONG Guang-Lei , FAN Lin-Na , WANG Zhi-Liang , YANG Jia-Hai
2025, 36(1):321-340. DOI: 10.13328/j.cnki.jos.007194 CSTR: 32375.14.jos.007194
Abstract:Capturing an accurate view of IP geolocation is of great interest to the networking research community as it has many uses ranging from network measuring and mapping to analyzing the network’s infrastructure. However, the scale of today’s Internet, coupled with the rapid development of Internet applications, makes it very challenging to acquire a complete and accurate snapshot of the IP geolocation technology. To the best of our knowledge, there is no systematic survey of the relevant research in this field. To fill this gap, this study systematically summarizes the research of client-independent IP geolocation, in which the clients do not participate in the geolocation process. This study aims to examine the major research studies that have been conducted on topics related to IP geolocation in the last 22 years since the first IP-based geolocation technology was proposed. To this end, these prior studies are classified according to the measurement method, that is, active, passive, and hybrid. The main techniques for each category are described, identifying their significant advantages and limitations. Also, the primary experience and lessons learned from these past efforts are presented. After the process, the latest progress in IP geolocation both in academia and industry is shown. Finally, the survey and present promising directions in the future are concluded, hoping to promote the development of IP geolocation.
CHEN Bo-Yan , SHEN Qing-Ni , ZHANG Xiao-Lei , ZHANG Xin , LI Cong , WU Zhong-Hai
2025, 36(1):341-370. DOI: 10.13328/j.cnki.jos.007196 CSTR: 32375.14.jos.007196
Abstract:As artificial intelligence and 5G technology are applied in the automotive industry, the intelligent connected vehicle came into being. It is a complex distributed heterogeneous system composed of a large number of electronic control units (ECUs) from different suppliers and collaborates to control each ECU through the in-vehicle network protocol represented by CAN. However, an attacker could attack an intelligent connected vehicle through a variety of interfaces to penetrate the in-vehicle network, and then attack the in-vehicle network and its components such as ECU. Therefore, in-vehicle network security for intelligent connected vehicles has become one of the focuses of vehicle security research in recent years. On the basis of introducing the structure of intelligent connected vehicle, ECU, CAN bus and on-board diagnostic protocol, this study first summarizes the research progress of reverse engineering technology for in-vehicle network protocols. The reverse engineering technology aims to obtain the implementation details of in-vehicle network protocols that are usually not disclosed in the automotive industry. It is also a prerequisite for the implementation of in-vehicle network attack and defense. The remaining part is developed from two angles of attack and defense. On the one hand, the attack vectors and main attack technologies of in-vehicle network are summarized, including the attack technologies implemented through physical access and remote access, as well as the attack technologies implemented against ECU and CAN bus. On the other hand, the existing in-vehicle network defense technologies are discussed, including the intrusion detection technology based on feature extraction and machine learning methods, and the security enhancement technology of in-vehicle network protocols based on cryptographic approaches. Finally, the future research direction is prospected.
LI Tong , XU Du-Ling , WU Bo , GUO Xiong-Wen , JIANG Dai-Jun , LUO Cheng , LU Wei , DU Xiao-Yong
2025, 36(1):371-398. DOI: 10.13328/j.cnki.jos.007231 CSTR: 32375.14.jos.007231
Abstract:Wide area network (WAN) has become critical infrastructure in the 21st century, connecting new businesses, new infrastructure, and various emerging applications. In recent years, there has been an explosive growth in data volume, accompanied by the continuous emergence of new application forms such as large-scale WAN-based models, digital economy, metaverse, and holographic society. In addition, the emergence of new service architectures, such as China’s “East Data, West Computing” project, computing power networks, and data fields, has posed increasingly high requirements for the data transmission quality of WAN. For instance, WAN must deliver not only timely but also real-time services, making latency a critical deterministic metric to meet. Therefore, wide area deterministic network emerges as a new paradigm of WAN. This study systematically reviews the connotation of deterministic networks and the development of traditional technologies related to deterministic networks. It introduces new applications of wide area deterministic network, discusses their new characteristics and transmission challenges, and proposes new goals for them. Based on the aforementioned new applications, characteristics, challenges, and goals, this study summarizes the main research progress in the field of wide area deterministic network in detail and provides future research directions. It is hoped that this study will provide reference and assistance for research in this field.
LIANG Jie , WU Zhi-Yong , FU Jing-Zhou , ZHU Juan , JIANG Yu , SUN Jia-Guang
2025, 36(1):399-423. DOI: 10.13328/j.cnki.jos.007048 CSTR: 32375.14.jos.007048
Abstract:Database management systems (DBMSs) are the infrastructure for efficient storage, management, and analysis of data, playing a pivotal role in modern data-intensive applications. Vulnerabilities in DBMSs pose a great threat to the security of data and the operation of applications. Fuzzing is one of the most popular dynamic vulnerability detection techniques and has been applied to analyze DBMSs, uncovering many vulnerabilities. This study analyzes the requirements and the difficulties involved in testing a DBMS and proposes a foundational framework for DBMS fuzzing. It also analyzes the challenges encountered by DBMS fuzzers and identifies the dimensions that necessitate support. It introduces typical DBMS fuzzers from the perspective of discovering different types of vulnerabilities and summarizes key techniques in DBMS fuzzing, including SQL statement synthesis, code coverage tracking, and test oracle construction. Several popular DBMS fuzzers are evaluated in terms of coverage, syntax and semantic correctness of the generated test cases, and the ability to find vulnerabilities. Finally, it presents the problems faced by current DBMS fuzzing research and practices and prospects for future research directions in DBMS fuzzing.
WENG Si-Yang , YU Rong , WANG Qing-Shuai , HU Zi-Rui , NI Lü , ZHANG Rong , ZHOU Xuan , ZHOU Ao-Ying , XU Quan-Qing , YANG Chuan-Hui , LIU Wei , YANG Pan-Fei
2025, 36(1):424-445. DOI: 10.13328/j.cnki.jos.007225 CSTR: 32375.14.jos.007225
Abstract:Requirements for the effective real-time analysis of instant data modification of database systems have driven the rapid development of Hybrid Transactional/Analytical Processing (HTAP) database systems, which support to process both OLTP and OLAP workloads. To realize fair comparisons and healthy development, it is crucial to define and implement new benchmarks to evaluate new features of HTAP database systems. Firstly, this study analyzes the key characteristics of HTAP database systems and summarizes the distinct technologies in their implementations. Secondly, the difficulties of designing HTAP database systems and the challenges of constructing HTAP benchmarks are extracted. Based on these, the design dimensions of HTAP benchmarks are proposed, including data generation, workload generation, evaluation metrics, and consistency model supportability. This study compares differences between existing HTAP benchmarks in terms of design dimensions and implementation technologies and sums up their merits and defects in different dimensions. Additionally, the published benchmarks are demonstrated and their abilities of evaluating key features and supporting horizontal comparisons among HTAP database systems are analyzed. Finally, this study concludes the requirements for HTAP benchmarks and some future research directions, pointing out that semantically consistent workload control and fresh data access metrics are the key issue in defining benchmarks for HTAP database systems.
LIU Hai-Long , WANG Shuo , HOU Shu-Feng , XU Hai-Yang , LI Zhan-Huai
2025, 36(1):446-468. DOI: 10.13328/j.cnki.jos.007240 CSTR: 32375.14.jos.007240
Abstract:Multi-tenant cloud databases offer services more cheaply and conveniently, with advantages like paying on demand, scaling on demand, automatic deployment, high availability, self-maintenance, and shared resources. Now more and more enterprises and individuals begin to host their database services on database as a service (DaaS) platforms. These DaaS platforms provide services to multiple tenants in accordance with their service-level agreements (SLAs), while improving revenue for themselves. However, due to the dynamic, heterogeneous, and competitive characteristics of multiple tenants and their loads, it is a very challenging task for DaaS platform providers to adaptively plan and schedule resources according to dynamic loads while complying with multi-tenants’ SLAs. For common types of multi-tenant cloud databases, such as relational databases, this survey firstly analyzes the challenges faced by resource planning and scheduling of multi- tenant cloud databases in detail and then outlines related key scientific issues. Then, it provides a framework of related techniques and a summary of existing research in four areas: resource planning and scheduling technologies, resource forecasting technologies, resource elastic scaling technologies, and resource planning and scheduling tools for existing databases. Lastly, this survey provides suggestions for future research directions on resource planning and scheduling technologies for multi-tenant cloud databases.