WANG Quan , WU Zhong-Hai , CHEN Yi-Xiang , MIAO Qi-Guang
2020, 31(9):2625-2626. DOI: 10.13328/j.cnki.jos.005947 CSTR:
Abstract:
GE Dao-Hui , LI Hong-Sheng , ZHANG Liang , LIU Ru-Yi , SHEN Pei-Yi , MIAO Qi-Guang
2020, 31(9):2627-2653. DOI: 10.13328/j.cnki.jos.005942 CSTR:
Abstract:Deep neural network has been proved to be effective in solving problems in different fields such as image, natural language, and so on. At the same time, with the continuous development of mobile Internet technology, portable devices have been rapidly popularized, and users have put forward more and more demands. Therefore, how to design an efficient and high performance lightweight neural network is the key to solve the problem. In this paper, three methods of constructing lightweight neural network are described in detail, which are artificial design of lightweight neural network, compression algorithm of neural network model, and automatic neural network architecture design based on searching of neural network architecture. The characteristics of each method are summarized and analyzed briefly, and the typical algorithms of constructing lightweight neural network are introduced emphatically. Finally, the existing methods are summarized and the prospects for future development are given.
ZHANG Zheng-Kui , PANG Wei-Guang , XIE Wen-Jing , LÜ Ming-Song , WANG Yi
2020, 31(9):2654-2677. DOI: 10.13328/j.cnki.jos.005946 CSTR:
Abstract:The persistent advance of deep learning algorithms and GPU computing power have promoted artificial intelligence in various fields including but not limited to compute vision, speech recognition, and natural language processing. Meanwhile, deep learning already began exploiting its usage in safety-critical areas exemplified by self-driving vehicles. Unfortunately, the successive severe traffic accidents in the past two years manifest that deep learning technology is still far from mature to fulfill safety-critical standards, and consequently the trustworthy artificial intelligence starts to attract a lot of research interests worldwide. This article conveys a state-of-the-art survey of the research on deep learning for real-time applications. It first introduces the main problems and challenges when deploying deep learning on the real-time embedded systems. Then, a detailed review covering various topics is provided, such as deep neural network lightweight design, GPU timing analysis and workload scheduling, shared resource management on the CPU+GPU SoC platform, deep neural network and network accelerator co-design. Finally, open issues and research directions are identified to conclude the survey.
ZHANG Liang , LIU Zhi-Yu , CAO Jing-Ying , SHEN Pei-Yi , JIANG De-Zhi , MEI Lin , ZHU Guang-Ming , MIAO Qi-Guang
2020, 31(9):2678-2690. DOI: 10.13328/j.cnki.jos.005937 CSTR:
Abstract:Cartographer is Google's 2016 open source SLAM algorithm framework for low computational resource consumption in multi-sensor configurations. In this study, due to the inaccurate middle posture fusion and delay of the original Cartographer, a multi- sensor posture fusion method based on posture increment was designed. Subsequently, the multi-module SLAM system based on enhanced Cartographer algorithm was designed and implemented for the cleaning robot Player platform. Finally, the effectiveness of the enhanced Cartographer algorithm and the usability of the SLAM system on the Player robot platform were verified by the experimental analysis of the Cartographer data set and the actual test of the real scenario.
ZHANG Zhan , ZHANG Xian-Qi , ZUO De-Cheng , FU Guo-Dong
2020, 31(9):2691-2708. DOI: 10.13328/j.cnki.jos.005938 CSTR:
Abstract:Target tracking algorithm has been widely used in many fields. However, due to the problems of real-time and power consumption, it is difficult to deploy the algorithm based on deep learning model on mobile terminal devices. This work studies the deployment strategy of target tracking algorithm on mobile devices from the perspective of application deployment optimization combined with edge computing technology. A deployment strategy of target tracking application oriented to edge computing is proposed based on the analysis of device characteristics and edge cloud network architecture. The computing task of target tracking application is reasonably unloaded to edge cloud by task segmentation strategy and the computing results are analyzed and fused by the information fusion strategy. In addition, a motion detection scheme is proposed to further reduce the computing pressure and power consumption of terminal nodes The experimental results show that compared with local computing, the deployment strategy significantly reduces the response time of the task, and compared with completely uninstalling to the edge cloud, the deployment strategy reduces the processing time of the same computing task.
ZHANG Wen-Wen , XU Tian-Yu , ZHANG Yue , ZHENG Xiao-Yao
2020, 31(9):2709-2722. DOI: 10.13328/j.cnki.jos.005940 CSTR:
Abstract:SDN (software-defined network) is designed to solve the problems of traditional networks with complex and scattered architecture, making the network more flexible.The characteristic of the P4 programming language is that users can directly define P4 programs according to their needs for processing data packets,and then compile and configure user requirements to network equipment through the adaptation file.The SDN data plane conformance test for the P4 programming language is to send the consistency test cases to the P4 network equipment to evaluate the consistency of the actual output and the expected output.Conformance test cases are the carriers for performing conformance tests,and the traditional manual construction of test cases is a tedious and time-consuming work.This article focuses on the design principles and generation method of SDN data plane software conformance test cases for P4 programming language, gives the conformance test case coverage standards,designs the command information entity structure and the test case entity structure, and uses the simple_switch virtual switch loaded with the P4 program as the test object to illustrate the conformance test case generation process.This paper also implements an automatic test case generation tool for P4 network device conformance testing, and verifies the effectiveness of the tool to automatically generate test cases, and realizes the simplicity of the conformance test case construction process
XU Rui , GU Shou-Zhen , Edwin H-M Sha , ZHUGE Qing-Feng , SHI Liang , GAO Si-Yuan
2020, 31(9):2723-2740. DOI: 10.13328/j.cnki.jos.005941 CSTR:
Abstract:Nowadays, it has become a trend that embedded systems are designed for big data and artificial intelligence applications, which demand the large capacity and high access performance of memory. Domain wall memory (DWM) is a novel non-volatile memory with high access performance, high density, and low power consumption. Thus, for data-intensive applications specific embedded systems, DWM can meet the requirements of access speed, capacity, and power consumption. However, before accessing data on DWM, data in nanowires need to be shifted to align them with read/write port, which is called shift operation. Numerous shift operations take most of time and generate much quantity of heat when accessing data on DWM. It will decrease the access speed of DWM and system performance further. In that case, reducing shift operations of DWM can significantly improve the system performance. This study aims at data-intensive application specific embedded systems with multi-port DWM, and explores optimal instruction schedule and data placement strategy which achieve minimum shift operations. An integer linear programming (ILP) model is firstly proposed to obtain minimum number of shifts. Since ILP model cannot find the optimal solution in polynomial time, a heuristic algorithm is proposed to reduce the number of shifts on DWM—generation instruction scheduling and data placement (GISDP) algorithm. The experimental results show that ILP model and GISDP algorithm can effectively reduce shift operation. On target system with 8 read/write ports DWM, GISDP can reduce shift operations by 89.7% on average when compared with other algorithms, and the results of GISDP are close to the optimal solutions of ILP.
LI De-Guang , GUO Bing , ZHANG Rui-Ling , MA You-Zhong , REN Zhen-Qin , ZHAO Xu-Ge , TAN Qing , LI Jun-Ke
2020, 31(9):2741-2755. DOI: 10.13328/j.cnki.jos.005939 CSTR:
Abstract:As a high energy consumption component of embedded devices, the power consumption of AMOLED displays is determined by the pixel value of all pixels of the displayed content. And human visual system gives priority to the important regions of the display content, while pays less attention to the unimportant region. Based on the above features, a power optimization method is proposed for AMOLED display based on multi-region visual saliency. The core of the method is to extract the important region of the display content by the multi-region visual saliency algorithm, and then the display content is divided into multiple regions according to saliency map of important region of the image. At last, the dynamic pixel adjustment is carried out based on the visual attention level, which minimizes the display power without reducing the overall visual effect of the display content. Finally, through a number of image testing, the results show that the power consumption of image can be reduced while maintaining a good visual effect.
WEI Fan , SONG Yun-Fei , SHAO Ming-Li , LIU Tian , CHEN Xiao-Hong , WANG Xiang-Feng , CHEN Ming-Song
2020, 31(9):2756-2769. DOI: 10.13328/j.cnki.jos.005943 CSTR:
Abstract:It is an inevitable trend to use deep neural network to process the massive image data generated by the rapid increase of Internet of Things (IoT) devices. However, as the DNN is vulnerable to adversarial examples, it is easy to be attacked and would endanger the security of the IoT. So how to improve the robustness of the model has become an important topic. Usually, the defensive performance of the ensemble model is better than the single model, but the limited computing power of the IoT device makes the ensemble model difficult to apply. Therefore, this study proposes a novel model transformation and training method on a single model to achieve similar defense effect like ensemble model: adding additional branches to the base model; using feature pyramids to fuse features; and introducing ensemble diversity for training. Experiments on the common datasets, like MNIST and CIFAR-10, show that this method can significantly improve the robustness. The accuracy increases more than fivefold against four gradient-based attacks such as FGSM, and can be up to 10 times while against JSMA, C&W, and EAD. This method does not disturb the classification of clean examples, and could obtain better performance while combining adversarial training.
LIN Yi-Shuai , LI Qing-Shan , LU Peng-Hao , SUN Yu-Nan , WANG Liang , WANG Ying-Zhi
2020, 31(9):2770-2784. DOI: 10.13328/j.cnki.jos.005944 CSTR:
Abstract:The optimization of intelligent warehousing is generally divided into shelf optimization and path optimization. Shelf optimization considers the position of goods and shelves, and optimizes the placement of goods. Path optimization mainly seeks the optimal path planning for automatic guided vehicles. At present, most of the studies focus on these two scenarios independently. In the actual warehousing application, the problem can only be solved by linear superposition, which makes the solution easy to fall into the local optimum. Based on the coupling analysis of the relationship between various sections in the intelligent warehousing process, this study proposes a mathematical model of cooperative optimization of shelf and position, which combines shelf optimization and path planning as a whole. In addition, a cooperative optimization framework, including a product similarity solving algorithm and an improved path planning algorithm, is proposed. Based on the above two algorithms, an improved genetic algorithm is proposed for the cooperative optimization of shelf and path. The experimental results verify the effectiveness and stability of the intelligent warehousing cooperative optimization algorithm proposed in this study. By using this algorithm, it can improve the shipping efficiency of storage and reduce transportation costs..
CHEN Jin-Yin , CHEN Zhi-Qing , ZHENG Hai-Bin , SHEN Shi-Jing , SU Meng-Meng
2020, 31(9):2785-2801. DOI: 10.13328/j.cnki.jos.005945 CSTR:
Abstract:With the wider application of deep learning in the field of computer vision, face authentication, license plate recognition, and road sign recognition have also presented commercial application trends. Therefore, research on the security of deep learning models is of great importance. Previous studies have found that deep learning models are vulnerable to carefully crafted adversarial examples that contains small perturbations, leading completely incorrect recognition results. Adversarial attacks against deep learning models are fatal, but they can also help researchers find vulnerabilities of models and make further improvements. Motivated by that, this study proposes a black box physical attack method based on particle swarm optimization (BPA-PSO) for deep learning road sign recognition model in scenario of autonomous vehicles. Under the premise of unknown model structure, BPA-PSO can not only realize the black box attack on deep learning models, but also invalidate the road sign recognition models in the physical scenario. The attack effectiveness of BPA-PSO algorithm is verified through a large number of experiments in the digital images of electronic space, laboratory environment, and outdoor road conditions. Besides, the abilities of discovering models' vulnerabilities and further improving the application security of deep learning are also demonstrated. Finally, the problems existing in the BPA-PSO algorithm are analyzed and possible challenges of future research are proposed.
ZHANG Ce , LIU Hong-Wei , BAI Rui , WANG Kan-Yu , WANG Jin-Yong , LÜ Wei-Gong , MENG Fan-Chao
2020, 31(9):2802-2825. DOI: 10.13328/j.cnki.jos.006085 CSTR:
Abstract:FDR (fault detection rate), as the key element of reliability research, has great importance in constructing the test environment, improving fault detection efficiency, and modeling and improving reliability. Meantime, it has important practical significance for improving system reliability and determining release time. First, the software reliability growth model SRGM (software reliability growth mode) based on NHPP (non-homogeneous poisson process) is summarized, and the essence, function, and process of modeling are given. Second, based on this, FDR, the key parameter in reliability modeling and researching, is derived, and the definition of it is given. The test environment description ability is analyzed and differences of different models are shown. Third, emphasis is placed on the difference among FDR, failure strength, and hazard rate (risk rate), and then the correlation among the three is derived. Next, the general model of FDR is comprehensively analyzed from three perspectives of test coverage function, FDR set directly, and FDR constituted by testing effort function. Then a unified FDR-related reliability model is proposed. Considering the ability to describe the real test environment, the imperfect debugging framework model is established, and the reliability growth model of multiple different FDRs under imperfect debugging is derived. Further, experiments are carried out on 12 publicly available failure data sets describing real application scenarios to verify the effectiveness of reliability models related to different FDR models, and to analyze and discuss the differences. The results show that the performance of the FDR model can support the performance improvement of the reliability model. Finally, the trend of researches and the problems to be solved are pointed out.
GUO Zhao-Qiang , ZHOU Hui-Cong , LIU Shi-Ran , LI Yan-Hui , CHEN Lin , ZHOU Yu-Ming , XU Bao-Wen
2020, 31(9):2826-2854. DOI: 10.13328/j.cnki.jos.006087 CSTR:
Abstract:Bugs can affect the normal usage of a software system or even bring huge damages. In order to facilitate developer to find and fix bugs as soon as possible, information retrieval based bug localization techniques have been proposed. This kind of techniques regards bug localization as a task of text retrieval. Specifically, for a given bug report, a rank list of code entities in a descending order is provided according to relevance score between code entity and the bug. Developers can select entities in the rank from top to bottom, which helps reducing the review cost and accelerating the process of bug localization. In recent years, a great progress has been achieved in information retrieval based bug localization techniques. Nevertheless, it is still challenging to apply them in practice. This survey offers a systematic overview of recent research achievements in information retrieval based bug localization techniques. First, the research problem is introduced in information retrieval-based bug localization. Then, the current main research work is described in detail. After that, the related techniques are discussed. Finally, the opportunities and challenges are summarized in this field and the research directions in the future are outlined.
CHEN De-Yan , ZHAO Hong , ZHANG Xia
2020, 31(9):2855-2882. DOI: 10.13328/j.cnki.jos.005820 CSTR:
Abstract:Due to the powerful knowledge representation and reasoning ability, the ontology has been widely used in many domains. Nevertheless, the in-depth application of the ontology still faces many deep common semantic mapping problems. The existing ontology modeling methods only propose some simple guiding principles and basic steps, so that knowledge engineers still have no way to start. When constructing the domain semantic knowledge base based on the domain expert knowledge, the three types of common semantic mapping problems in the domain expert knowledge, such as polysemy, n-ary relationships, and security requirements, are studied in depth from five aspects, the corresponding semantic mapping methods are proposed, and 10 ontology modeling conventions are summarized. Finally, a complete application case is constructed and the five types of semantic mapping methods proposed in this study are evaluated.
2020, 31(9):2883-2902. DOI: 10.13328/j.cnki.jos.006083 CSTR:
Abstract:Recently, with the development of the intelligent surveillance, person re-identification (Re-ID) has attracted lots of attention in the academic and industrial communities, which aims to associate person images of the same identity under different non-overlapping cameras. Most of the current research works focus on the supervised case where all given training samples have label information. Considering the high cost of data labeling, these methods designed for the supervised setting have poor generalization in practical applications. This study focuses on person re-identification algorithms under the weakly supervised case including the unsupervised case and the semi-supervised case and classify and describe several state-of-the-art methods. In the unsupervised setting, these methods are divided into five categories from different technology perspectives, which include the methods based on pseudo-label, image generation, instance classification, domain adaptation, and others. In the semi-supervised setting, these methods are divided into four categories according to the case discrepancy, which are the case where a small number of persons are labeled, the case where there are few labeled images for each person, the case based on tracklet learning, and the case where there are the intra-camera labels but no inter-camera label information. Finally, several benchmark person re-identification datasets are summarized and some experimental results of these weak-supervised person re-Identification algorithms are analyzed.
ZHANG Zhi-Wei , WANG Guo-Ren , XU Jian-Liang , DU Xiao-Yong
2020, 31(9):2903-2925. DOI: 10.13328/j.cnki.jos.006091 CSTR:
Abstract:Blockchain technologies have gained more and more attention during the last few years. In general, blockchains are distributed ledgers in which the users do not fully trust each other. Embedded with consensus protocols and security mechanism, blockchain systems achieve several properties, such as immutability, and all the users agree on all the data records and histories of transactions. From the perspective of data management, blockchain is a distributed database, in which nodes agree with the orders of executions of all the transactions. Many works have been done to survey about the security and consensus problems for blockchains. This study aims to survey and analyze the techniques about data management for the blockchain systems. In traditional databases, it assumes that the nodes in the distributed database are trusted, and only the crash failure needs to be considered. On the other hand, as the blockchains consider the malicious nodes, it needs to consider Byzantine fault tolerance. These have brought new problems and challenges to the blockchains. Since blockchains and databases have similar architecture, many works have been done to translate the techniques from distributed databases to blockchains. Considering this, in this study, the techniques for the data management in blockchains are surveyed. Four aspects of management, including storage, transaction management, query processing, and blockchain scalability are focused on. The differences are compared and the benefits of the techniques in these areas are analyzed for blockchains.
ZHANG Jin-Hong , WANG Xing-Wei , YI Bo , HUANG Min
2020, 31(9):2926-2943. DOI: 10.13328/j.cnki.jos.006035 CSTR:
Abstract:Recently, the world-wide huge energy consumption of Internet has incurred a sustained attention, and energy saving has turned into one of the hot issues in the upcoming future networks in the past few years. A network-level green energy-saving mechanism over the backbone networks is proposed in this study: for one thing, in the global view, a smallest remaining capacity first (SRCF) based green routing algorithm is used to plan the global routing paths in the networks, which makes the number of the bundled links powered minimum and thus realizes the first step of energy saving; for the other, in the local view, a green-best fit deceasing (G-BFD) algorithm is used to gather traffic loads flowing through a bundled link to the smallest set of physical links, which enables the physical links powered off as much as possible and thus implements the further energy saving. In addition to saving energy, the proposed mechanism pays attention to guaranteeing the user's requirements on quality of service (QoS), that is, the mechanism maximizes the benefits of energy saving under the premise of providing QoS guarantee. In order to evaluate the proposed mechanism in the study comprehensively, the topologies of three typical backbone networks, namely CERNET2, GéANT, and INTERNET2, are chosen. Under the different traffic status of high load, medium load, and low load, the proposed mechanism is compared with the other three energy-saving mechanisms with regard to network power consumption and network performance (as for average routing hops, the number of physical links powered off, routing success rate, and running time) and further the differences among them are analyzed fully. The results of simulation indicate that the proposed mechanism has a remarkable energy saving effect and a satisfactory performance.
DONG Xiao , LIU Lei , LI Jing , FENG Xiao-Bing
2020, 31(9):2944-2964. DOI: 10.13328/j.cnki.jos.006051 CSTR:
Abstract:In recent years, with dominating capability shown in plenty of tasks, deep convolutional neural networks have been deployed in applications including object detection, autonomous driving, machine translation, etc. But these models are accompanied by huge amounts of parameters and bring a heavy computational burden. The neural network pruning technique can recognize and remove parameters that contribute little to the accuracy, resulting in reduced amounts of parameters and decreased theoretical computational requirement, thus providing a chance to accelerate neural network models. However, it is hard for the pruned sparse models to achieve efficient execution on GPUs, and the performance of sparse models cannot even match their well-optimized dense counterparts. This study designs a sparsity-aware code generating method, which can generate efficient GPU code for sparse convolutions in pruned neural networks. First, a template is designed for convolution operators with several optimizations targeting GPU architecture. Through compiling and analyzing, the operator template is transformed to the intermediate representation template, which serves as the input to the designed algorithm to generate sparse convolution code according to specific sparse convolution parameters. Moreover, to improve memory throughput, optimizations are performed on data access and data placement based on the characteristics of memory access in neural networks. Finally, as the location information can be encoded into the generated code implicitly, the index structure for the sparse parameters can be eliminated, reducing the memory footprint during the execution. In experiments, it is demonstrated that the proposed sparse code generating method can improve the performance of sparse convolutional neural networks compared with current methods.
ZHAO Shan , HAO Chun-Liang , ZHAI Jian , LI Ming-Shu
2020, 31(9):2965-2979. DOI: 10.13328/j.cnki.jos.005815 CSTR:
Abstract:In recent years, heterogeneous multi-core processors have gradually become the mainstream in the mobile computing environment. Compared with the traditional processor design, they can meet the computing needs of devices at a lower power cost. Microarchitecture differences between the CPU cores also pose new challenges for some basic methods in the operating systems. In this study, in order to resolve the load balancing problem of heterogeneous scheduling, a new load balancing mechanism called S-Bridge is proposed, which reduces the influence of the processor microarchitecture and the task requirement diversity. The main contribution of S-Bridge is to provide a universal, heterogeneity-aware load balancing interface, so that any scheduler can easily adapt to the heterogeneous multi-core processor systems. The experiments based on CFS and HMP on the X86 and ARM platforms show that S-Bridge can be implemented on different platforms with different kernel versions. The average performance increases by more than 15%, and in some best cases 65% is achieved.