WANG Li-Peng , GUAN Zhi , LI Qing-Shan , CHEN Zhong , HU Ming-Sheng
2023, 34(1):1-32. DOI: 10.13328/j.cnki.jos.006402 CSTR:
Abstract:Blockchain is a distributed ledger constructed by a series of network nodes. It owns the following security attributes: unforgeability, decentralization, trustless, provable security based on cryptography and non-repudiation. This paper summarizes those security services, including data confidentiality, data integrity, authentication, data privacy, assured data erasure. This paper first introduces the concept of blockchain and public key cryptography. For the above-mentioned 5 security services, existing security threats faced by users in actual scenarios and their corresponding solutions are analyzed. The drawbacks of those traditional implementations are also discussed, and then countermeasures are introduced based on blockchain. Finally, values and challenges associated with blockchain are discussed as well.
2023, 34(1):33-49. DOI: 10.13328/j.cnki.jos.006421 CSTR:
Abstract:In order to ensure the network-wide consensus and tamper proof of the transaction ledger, the miner nodes are required to possess strong computing and storage resource in the traditional blockchain technology. It greatly limits the resource-constrained devices to join in the blockchain systems. In recent years, blockchain technology has been expanded in many fields, such as financial economy, health care, Internet of Things, supply chain, etc. However, there is a large number of devices with weak computing power and low storage capacity in these application scenarios, which brings great challenges to the application of blockchain. Therefore, lightweight blockchain technology is emerging. This study summarizes some related works of lightweight blockchain from the two aspects of lightweight computing and storage. Their advantages and disadvantages are compared and analyzed. Finally, the future development of the lightweight blockchain systems is prospected.
MEI Yuan-Qing , GUO Zhao-Qiang , ZHOU Hui-Cong , LI Yan-Hui , CHEN Lin , LU Hong-Min , ZHOU Yu-Ming
2023, 34(1):50-102. DOI: 10.13328/j.cnki.jos.006503 CSTR:
Abstract:Object-oriented software metrics are important for understanding and guaranting the quality of object-oriented software. By comparing object-oriented software metrics with their thresholds, it could be simply and intuitively evaluated whether there is a bug. The methods to deriving metrics thresholds mainly include unsupervised learning methods based on the distribution of metric data and supervised learning methods based on the relationship between the metrics and defect-proneness. The two types of methods have their own advantages and disadvantages: unsupervised methods do not require label information to derive thresholds and are easy to implement, but the resulting thresholds often have a low performance in defect prediction; supervised methods improve the defect prediction performance by machine learning algorithms, but they need label information to derive the thresholds, which is not easy to obtain, and the linking technology between metrics and defect-proneness is complex. In recent years, researchers of the two types of methods have continued to explore and made a great progress. At the same time, it is still challenging to derive the thresholds of object-oriented software metrics. This paper presents the systematic survey on the recent research achievements in deriving metric thresholds. First, the research problem is introduced in object-oriented software metric threshold derivation. Then, the current main research work is described in detail from two aspects: unsupervised and supervised learning methods. After that, the related techniques are discussed. Finally, the opportunities and challenges are summarized in this field and the research directions in the future are outlined.
YE Shi-Jun , ZHANG Peng-Cheng , JI Shun-Hui , DAI Qi-Yin , YUAN Tian-Hao , REN Bin
2023, 34(1):103-129. DOI: 10.13328/j.cnki.jos.006409 CSTR:
Abstract:With the rapid development of neural network and other technologies, artificial intelligence has been widely applied in safety-critical or mission-critical systems, such as autopilot systems, disease diagnosis systems, and malware detection systems. Due to the lack of a comprehensive and in-depth understanding of artificial intelligence software systems, some errors with serious consequences occur frequently. The functional attributes and non-functional attributes of artificial intelligence software systems are proposed to enhance the adequate understanding and quality assurance of artificial intelligence software systems. After investigation, a large number of researchers are devoted to the study of functional attributes, but people are paying more and more attention to the non-functional attributes of artificial intelligence software systems. This paper investigates 138 papers in related fields, systematically combs the existing research results from the aspects of attribute necessity, attribute definition, attribute examples, and common quality assurance methods, and summarizes the research work on non-functional attributes of artificial intelligence software systems. At the same time, a summary and relationship analysis are presented on the non-functional attributes of artificial intelligence software systems. The open source tools that can be used in the research of artificial intelligence software system are surveyed. Finally, the thoughts on potential future research directions and challenges are summarized on non-functional attributes of artificial intelligence software systems, which, hopefully, will provide references for researchers interested in the related directions.
TIAN Tian , YANG Xiu-Ting , WANG An-Shi , YU Xu , GONG Dun-Wei
2023, 34(1):130-149. DOI: 10.13328/j.cnki.jos.006425 CSTR:
Abstract:In the process of software testing, the expected output of a program under test is an important factor in judging whether the program is defective or not. Metamorphic testing technique uses the properties of the program under test to check the output of the program, so as to effectively solve the problem of being difficult to construct the expected output of the program. In recent years, metamorphic testing has blossomed in the field of software testing. Many researchers have optimized techniques related to metamorphic testing and applied them to various fields to effectively improve software quality. This study summarizes and analyzes the research work of metamorphic testing from the following three aspects: theoretical knowledge, improvement strategies and application areas, and focuses on the research results of the past five years. Meanwhile, the potential research is discussed when metamorphic testing is applied for parallel programs. First, the basic concepts of metamorphic testing and the metamorphic testing process are provided. Next, according to its steps, the optimization techniques for metamorphic testing are summarized from the four perspectives: metamorphic relationships, test case generation, test execution, and metamorphic testing tools. Then, the application fields of metamorphic testing are listed. Finally, based on the existing research results, the problems faced by metamorphic testing are discussed in parallel program testing, and the possible solutions are provided for further research.
TIAN Ying-Chen , LI Ke-Jun , WANG Tai-Ming , JIAO Qing-Qing , LI Guang-Jie , ZHANG Yu-Xia , LIU Hui
2023, 34(1):150-170. DOI: 10.13328/j.cnki.jos.006431 CSTR:
Abstract:Code smells are low-quality code snippets that are in urgent need of refactoring. Code smell is a research hotspot in software engineering, with many related research topics, large time span, and rich research results. To sort out the relevant research approach and results, analyze the research hotspots, and predict the future research directions, this study systematically analyzes and classifies 339 papers related to code smell published from 1990 to June 2020. The development trend of code smells is analyzed and counted, the mainstream and hot spots of related research are quantitatively revealed, the key code smells concerned by the academia are identified, and also the differences of concerns between industry and academia are studied.
ZOU Wei-Qin , ZHANG Jing-Xuan , ZHANG Xiao-Wei , CHEN Lin , XUAN Ji-Feng
2023, 34(1):171-196. DOI: 10.13328/j.cnki.jos.006500 CSTR:
Abstract:During the software development and maintenance process, bug fixers usually refer to bug reports submitted by end-users or developers/testers to locate and fix a bug. In this sense, the quality of the bug report largely determines whether the bug fixer could quickly and precisely locate the bug and further fix it. Researchers have done much work on characterizing, modeling, and improving the quality of bug reports. This study offers a systematic survey on existing work on bug report quality, with an attempt to understand the current state of research on this area as well as to open new avenues for future research work. Firstly, quality problems of bug reports reported by existing studies are summarized into a list, such as the missing of key information and errors in information items. Then, a series of work on automatically modeling bug report quality are presented. After that, those approaches are introduced that aim to improve bug report quality. Finally, the challenges and potential opportunities for research on bug report quality are discussed.
TANG Ling-Tao , CHEN Zuo-Ning , ZHANG Lu-Fei , WU Dong
2023, 34(1):197-229. DOI: 10.13328/j.cnki.jos.006411 CSTR:
Abstract:With the vigorous development of areas such as big data and cloud computing, it has become a worldwide trend for the public to attach importance to data security and privacy. Different groups are reluctant to share data in order to protect their own interests and privacy, which leads to data silos. Federated learning enables multiple parties to build a common, robust model without exchanging their data samples, thus addressing critical issues such as data fragmentation and data isolation. However, more and more studies have shown that the federated learning algorithm first proposed by Google can not resist sophisticated privacy attacks. Therefore, how to strengthen privacy protection and protect users’ data privacy in the federated learning scenario is an important issue. This paper offers a systematic survey of existing research achievements of privacy attacks and protection in federated learning in recent years. First, the definition, characteristics and classification of federated learning are introduced. Then the adversarial model of privacy threats in federated learning is analyzed, and typical works of privacy attacks are classified with respect to the adversary’s objectives. Next, several mainstream privacy-preserving technologies are introduced and their advantages and disadvantages in practical applications are pointed out. Furthermore, the existing achievements on protection against privacy attacks are summarized and six privacy-preserving schemes are elaborated. Finally, future challenges of privacy preserving in federated learning are concluded and promising future research directions are discussed.
YANG Peng-Bo , SANG Ji-Tao , ZHANG Biao , FENG Yao-Gong , YU Jian
2023, 34(1):230-254. DOI: 10.13328/j.cnki.jos.006415 CSTR:
Abstract:Deep learning has made great achievements in various fields such as computer vision, natural language processing, speech recognition, and other fields. Compared with traditional machine learning algorithms, deep models have higher accuracy on many tasks. Because deep learning is an end-to-end, highly non-linear, and complex model, the interpretability of deep models is not as good as traditional machine learning algorithms, which brings certain obstacles to the application of deep learning in real life. It is of great significance and necessary to study the interpretability of depth model, and in recent years many scholars have proposed different algorithms on this issue. For image classification tasks, this study divides the interpretability algorithms into global interpretability and local interpretability algorithms. From the perspective of interpretation granularity, global interpretability algorithms are further divided into model-level and neuron-level interpretability algorithms, and local interpretability algorithms are divided into pixel-level features, concept-level features, and image-level feature interpretability algorithms. Based on the above framework, this study mainly summarizes the common deep model interpretability research algorithms and related evaluation indicators, and discusses the current challenges and future research directions for deep model interpretability research. It is believed that conducting research on the interpretability and theoretical foundation of deep model is a necessary way to open the black box of the deep model, and interpretability algorithms have huge potential to provide help for solving other problems of deep models, such as fairness and generalization.
LI Jin-Yao , DU Xiao-Bing , ZHU Zhi-Liang , DENG Xiao-Ming , MA Cui-Xia , WANG Hong-An
2023, 34(1):255-276. DOI: 10.13328/j.cnki.jos.006420 CSTR:
Abstract:Emotion is the external expression of affect, which has an influence on cognition, perception, and decision-making of people’s daily life. As one of the basic problems in the realization of overall computer intelligence, emotion recognition has been studied in depth and widely applied in fields of affective computing and human-computer interaction. Comparing with facial expression, speech and other physiological signals, using EEG to recognize emotion is attracting more attention for its higher temporal resolution, lower cost, better identification accuracy, and higher reliability. In recent years, more deep learning architectures are applied and have achieved better performance than traditional machine learning methods in this task. Deep learning for EEG-based emotion recognition is one of the research focuses and it remains many challenges to overcome. Considering that there exist few reviews literature to refer to, this study investigates the implementation of deep learning in EEG-based emotion recognition. Specifically, input formulation, deep learning architecture, experimental setting and results are surveyed. Besides, articles that evaluated their model on the widely used datasets, DEAP and SEED, perform qualitative and quantitative analysis are carefully screened from different aspects and a comparison is accomplished. Finally, the total work is summarized and the prospect of future work is given.
ZHANG Tian-Cheng , TIAN Xue , SUN Xiang-Hui , YU Ming-He , SUN Yan-Hong , YU Ge
2023, 34(1):277-311. DOI: 10.13328/j.cnki.jos.006429 CSTR:
Abstract:Knowledge graph (KG) is a kind of technology that uses graph model to describe the relationship between knowledge and modeling things. Knowledge graph embedding (KGE), as a widely adopted knowledge representation method, its main idea is to embed entities and relationships in a knowledge graph into a continuous vector space, which is used to simplify operations while preserving the intrinsic structure of the KG. It can benefit a variety of downstream tasks, such as KG completion, relation extraction, etc. Firstly, the existing knowledge graph embedding technologies are comprehensively reviewed, including not only techniques using the facts observed in KG for embedding, but also dynamic KG embedding methods that add time dimensions, as well as KG embedding technologies that integrate multi-source information. The relevant models are analyzed, compared and summarized from the perspectives of entity embedding, relation embedding and scoring functions. Then, typical applications of KG embedding technologies in downstream tasks are briefly introduced, including question answering systems, recommendation systems and relationship extraction. Finally, the challenges of knowledge graph embedding are expounded, and the future research directions are prospected.
QIAO Shao-Jie , WU Ling-Chun , HAN Nan , HUANG Fa-Liang , MAO Rui , YUAN Chang-An , Louis Alberto GUTIERREZ
2023, 34(1):312-333. DOI: 10.13328/j.cnki.jos.006395 CSTR:
Abstract:How to utilize multi-source and heterogeneous spatio-temporal data to achieve accurate trajectory prediction as well as reflect the movement characteristics of moving objects is a core issue in the research field of trajectory prediction. Most of the existing trajectory prediction models are used to predict long sequential trajectory patterns according to the characteristics of historical trajectories, or the current locations of moving objects are integrated into spatio-temporal semantic scenarios to predict trajectories based on historical trajectories of moving objects. This survey summarizes the currently commonly-used trajectory prediction models and algorithms, involving different research fields. Firstly, the state-of-the-art works of multiple-motion trajectory prediction and the basic models of trajectory prediction are described. Secondly, the prediction models of different categories are summarized, including mathematical statistics, machine learning, filtering algorithm, as well as the representative methods in these research fields. Thirdly, the context awareness techniques are introduced, the definition of context awareness by different scholars from different research fields are described, the key technical points of context awareness techniques are presented, such as the different kinds of models on context awareness computing, context acquisition and context reasoning, and the different categories, filtering, storage and fusion of context awareness and their implementation methods are analyzed. The technical roadmap of multiple-motion-pattern trajectory prediction of moving objects with context awareness and the working mechanism of each task is introduced in detail. This survey presents the real-world application scenarios of context awareness techniques, for example, location recommendation, point of interest recommendation. By comparing them with traditional algorithms, the advantages and disadvantages of context awareness techniques in the aforementioned applications are discussed. The new methods for pedestrian trajectory prediction based on context awareness and long short-term memory (LSTM) techniques are introduced in detail. Lastly, the current problems and future trends of trajectory prediction and context awareness are summarized.
ZHOU Min-Yuan , ZHENG Jia-Qi , DOU Wan-Chun , CHEN Gui-Hai
2023, 34(1):334-350. DOI: 10.13328/j.cnki.jos.006435 CSTR:
Abstract:Anycast uses BGP to achieve the best path selection by assigning the same IP address to multiple terminal nodes.In recent years, as anycast technology has become more and more common, it has been widely used in DNS and CDN services. This studyfisrtlyintroduces anycast technology in an all-round wayand then discusses current problems of anycast technology and summarizes these problems into three categories: anycast inference is imperfect, anycast performance cannot be guaranteed, and it is difficult to control anycast load balancing. In response to these problems, the latest research progress is described. Finally, the problems in solving anycast problems and the direction of improvementare summarizedtoprovide useful references for researchers in related fields.
FANG Xing , HU Bo , MA Chao , HUANG Wei-Qing
2023, 34(1):351-380. DOI: 10.13328/j.cnki.jos.006510 CSTR:
Abstract:With the increasing scale and complexity of computer networks, it is difficult for network administrators to ensure that the network intent has been correctly realized, and the incorrect network configuration will affect the security and availability of the network. Inspired by the successful application of formal methods in the field of hardware verification and software verification, researchers applied formal methods to networks, forming a new research field, namely network verification, which aims to use rigorous mathematical methods to prove the correctness of the network. Network verification has become a hot research topic in the field of network and security, and its research results have been successfully applied in actual networks. From the three research directions of data plane verification, control plane verification, and stateful network verification, this study systematically summarizes the existing research results in the field of network verification, and analyzes the research hotspots and related solutions, aiming to organize the field of network verification and provides systematic references and future work prospects for researchers in the field.
YANG Fan , ZHANG Qian-Ying , SHI Zhi-Ping , GUAN Yong
2023, 34(1):381-403. DOI: 10.13328/j.cnki.jos.006501 CSTR:
Abstract:In order to protect the security of the execution environment of security-sensitive programs in computing devices, researchers have proposed the trusted execution environment (TEE) technology, which provides security-sensitive programs with a secure execution environment isolated from the rich computing environment by hardware and software isolations. Side-channel attacks have evolved from traditional attacks requiring expensive equipment to now attacks using software to infer confidential information from its access mode obtained through microarchitecture states. The TEE architecture only provides an isolation mechanism and cannot resist this kind of emerging software side-channel attacks. This study thoroughly investigates the software side-channel attacks and corresponding defense mechanisms of three TEE architectures: ARM TrustZone, Intel SGX, and AMD SEV, and discusses the development trends of the attacks and defense mechanisms. First, this study introduces the basic principles of ARM TrustZone, Intel SGX, and AMD SEV, and then elaborates on the definition of software side-channel attacks and the classification, methods, and steps of cache side-channel attacks. Second, from the perspective of processor instruction execution, a TEE attack surface classification method is proposed to classify TEE software side-channel attacks, and the attacks combining software side-channel attacks and other attacks are explained. Third, the threat model of TEE software side-channel attacks is discussed in detail. Finally, the industry’s defense mechanisms against TEE software side-channel attacks are comprehensively summarized, and some future research trends of TEE software side-channel attacks are discussed from two aspects: attack and defense.
MA Chuan-Wang , ZHANG Yu , FANG Bin-Xing , ZHANG Hong-Li
2023, 34(1):404-420. DOI: 10.13328/j.cnki.jos.006513 CSTR:
Abstract:Anonymous networks aim to protect the user’s communication privacy in open network environment. Since Chaum proposed Mix-net, related work has been progressing in decades. Nowadays, based on Mix-net, DC-net or PIR, many anonymous networks have been developed, for various application scenarios and threat models by integrating multiple design elements. Beginning from anonymity concepts, this paper introduces the overall development of anonymous network area. Representative works and their design choices are classified and articulated. The characteristics of anonymous networks are systematically analyzed from the perspectives of anonymity, latency, bandwidth overhead, etc.
MAO Jia-Li , WU Tao , LI Si-Jia , GUO Ye , ZHOU Ao-Ying , JIN Che-Qing , QIAN Wei-Ning
2023, 34(1):421-443. DOI: 10.13328/j.cnki.jos.006434 CSTR:
Abstract:Since ordinary city road map has not covered the road restrictions information for the lorry, and lacks of hot spots labeling, they cannot satisfy massive batches and long-distance road transportation requirements of bulk commodity transporting. In order to address the issues of frequent transportation accidents and low logistics efficiency, and further improve the truck drivers’ travel experience, it is urgent to combine the type of goods transported with the type of truck as well as the driver’s route selection preference to study the building method of customized logistics map for bulk commodity transporting. With the widespread applications of mobile Internet and Internet of vehicles, spatio-temporal data generated by bulk commodity transporting is growing rapidly. It constitutes logistics big data with other logistics operational data, which provides a solid data foundation for logistics map building. This study first comprehensively reviews the state-of-the-art work about the issue of map building using trajectory data. Then, to tackle the limitations of existing digital map building methods in the field of bulk commodity transporting, a data-driven logistics map building framework is put forward using multi-source logistics data. The following researches are focused on: (1) multi-constraint logistics map construction based on users' prior knowledge; (2) dynamic spatio-temporal data driven logistics map incremental updating. Logistics map will become AI infrastructure for new generation of logistics technology fit for bulk commodity transportation. The research results of this study provide rich practical contents for the technical innovation of logistics map building, and offer new solutions to promote the cost reduction and efficiency improvement of logistics, which have important theoretical significance and application values.
ZHANG Kang , AN Bo-Zhou , LI Jie , YUAN Xia , ZHAO Chun-Xia
2023, 34(1):444-462. DOI: 10.13328/j.cnki.jos.006488 CSTR:
Abstract:In recent years, with the continuous development of computer vision, semantic segmentation and shape completion of 3D scene have been paid more and more attention by academia and industry. Among them, semantic scene completion is emerging research in this field, which aims to simultaneously predict the spatial layout and semantic labels of a 3D scene, and has developed rapidly in recent years. This study classifies and summarizes the methods based on RGB-D images proposed in this field in recent years. These methods are divided into two categories based on whether deep learning is used or not, which include traditional methods and deep learning-based methods. Among them, the methods based on deep learning are divided into two categories according to the input data type, which are the methods based on single depth image and the methods based on RGB-D images. Based on the classification and overview of the existing methods, the relevant datasets used for semantic scene completion task are collated and the experimental results are analyzed. Finally, the challenges and development prospects of this field are summarized.
WANG Yi-Cheng , ZENG Hong-Bin , XU Li-Jie , WANG Wei , WEI Jun , HUANG Tao
2023, 34(1):463-488. DOI: 10.13328/j.cnki.jos.006502 CSTR:
Abstract:Nowadays, the big data processing frameworks such as Hadoop and Spark have been widely used for data processing and analysis in industry and academia. These big data processing frameworks adopt the distributed architecture, generally developed in object-oriented languages like Java and Scala. These frameworks take Java virtual machine (JVM) as the runtime environment on cluster nodes to execute computing tasks, i.e., relying on JVM’s automatic memory management mechanism to allocate and reclaim data objects. However, current JVMs are not designed for the big data processing frameworks, leading to many problems such as long garbage collection (GC) time and high cost of data serialization and deserialization. As reported by users and researchers, GC time can take even more than 50% of the overall application execution time in some cases. Therefore, JVM memory management problem has become the performance bottleneck of the big data processing frameworks. This study systematically reviews the recent JVM optimization research work for big data processing frameworks. The contributions include the following three outcomes. First, the root causes of the performance degradation of big data applications when executed in JVM are summarized. Second, the existing JVM optimization techniques are summarized for big data processing frameworks. These methods are also classified into categories, the advantages and disadvantages of each are compared and analyzed, including the method’s optimization effects, application scopes, and burdens on users. Finally, some future JVM optimization directions are proposed, which will help the performance improvement of big data processing frameworks.
ZHANG Zheng , XUE Jing-Feng , ZHANG Jing-Ci , CHEN Tian , TAN Yu-An , LI Yuan-Zhang , ZHANG Quan-Xin
2023, 34(1):489-508. DOI: 10.13328/j.cnki.jos.006436 CSTR:
Abstract:Control-flow hijacking attacks exploit memory corruption vulnerabilities to grab control of the program, and then hijack the program to execute malicious code, which brings a great threat to system security. In order to prevent control-flow hijacking attacks, researchers have presented a series of defense methods. Control-flow integrity is a runtime defense method that prevents illegal transfer of process control-flow to ensure that control-flow is always within the range required by the program. In recent years, more and more research works are devoted to solving related problems of control-flow integrity, such as presenting new control-flow integrity schemes, new control-flow integrity scheme evaluation methods, etc. This study explains the basic principles of control flow integrity, and then classifies existing control flow integrity schemes. The existing evaluation methods and evaluation indicators of the control-flow integrity scheme are introduced at the same time. Finally, the thoughts on potential future work on control-flow integrity is summarized, which, hopefully, will provide an outlook of the research direction in the future.