GAO Feng-Juan , WANG Yu , CHEN Tian-Jiao , SITU Ling-Yun , WANG Lin-Zhang , LI Xuan-Dong
2020, 31(10):2983-3003. DOI: 10.13328/j.cnki.jos.006063 CSTR:
Abstract:During the rapid development of mobile computing, IoT, cloud computing, artificial intelligence, etc, many new programming languages and compilers are emerging. Even so, C/C++ language is still one of the most popular languages. And array is one of the most important data structures of C language. It is necessary to check whether the index is within the boundary of the array when using it to access the element of an array in a program. Otherwise, array index out-of-bounds will happen unexpectedly. When there are array index out-of-bounds defects existing in programs, some serious errors may occur during execution, such as system crash. It is even worse that array index out-of-bounds defects open the doors for attackers to take control of the server and execute arbitrary malicious code by carefully constructing input and intercepting the control flow of the programs. Existing static methods for array boundary checking cannot achieve high accuracy and deal with complex constraints and expressions, which lead to too many false positives. And it will increase the burden of developers. In this study, a static checking method is proposed based on taint analysis. First, a flow-sensitive, context-sensitive, and on-demand pointer analysis is proposed to analyze the range of array length. Then, an on-demand taint analysis is performed for all array indices and array length expressions. Finally, the rules are defined for checking array index out of bounds defects and the checking is realized based on backward data flow analysis. During the analysis, in order to deal with complex constraints and expressions, it is proposed to check the satisfiability of the conditions by invoking the constraint solver. If none statement for avoiding array index out-of-bound is found in the program, an array index out-of-bound warning will be reported. An automatic static analysis tool, Carray bound have been implemented, and the experimental results show that Carraybound can work effectively and efficiently.
XU Meng-Wei , LIU Yuan-Qiang , HUANG Kang , LIU Xuan-Zhe , HUANG Gang
2020, 31(10):3004-3018. DOI: 10.13328/j.cnki.jos.006064 CSTR:
Abstract:How to efficiently deploy machine learning models on mobile devices has drawn a lot of attention in both academia and industry, among which the model training is a critical part. However, with increasingly public attention on data privacy and the recently adopted laws and regulations, it becomes harder for developers to collect training data from users and thus cannot train high-quality models. Researchers have been exploring approaches of training neural networks on decentralized data. Those efforts will be summarized and their limitations be pointed out. To this end, this work presents a novel neural network training paradigm on mobile devices, which distributes all training computations associated with private data on local devices and requires no data to be uploaded in any form. Such training paradigm autonomous learning is named. To deal with two main challenges of autonomous learning, i.e., limited data volume and insufficient computing power available on mobile devices, the first autonomous learning system AutLearn is designed and implemented. It incorporates the cloud (public data, pre-training)—client (private data, transfer learning) cooperation methodology and data augmentation techniques to ensure the model convergence on mobile devices. Furthermore, by utilizing a series of optimization techniques such as model compression, neural network compiler, and runtime cache reuse, AutLearn can significantly reduce the on-client training cost. Two classical scenarios of autonomous learning are implemented based on AutLearn and carried out a set of experiments. The results showed that AutLearn can train the neural networks with comparable or even higher accuracy compared to traditional centralized/federated training mode with privacy preserved. AutLearn can also significantly reduce the computational and energy cost of neural network training on mobile devices.
LI Ding-Ji , MI Ze-Yu , WU Bao-Dong , CHEN Xun , ZHAO Yong-Wang , DING Zuo-Hua , CHEN Hai-Bo
2020, 31(10):3019-3037. DOI: 10.13328/j.cnki.jos.006068 CSTR:
Abstract:The increasing deployment of artificial intelligence has placed unprecedent requirements on the computing power of cloud computing. Cloud service providers have integrated accelerators with massive parallel computing units in the data center. These accelerators need to be combined with existing virtualization platforms to partition the computing resources. The current mainstream accelerator virtualization solution is through the PCI passthrough approach, which however does not support fine-grained resource provisioning. Some manufacturers also start to provide time-sliced multiplexing schemes, and use drivers to cooperate with specific hardware to divide resources and time slices to different virtual machines, which unfortunately suffer from poor portability and flexibility. One alternative another but promising approach is based on API forwarding, which forwards the virtual machine's request to the back-end driver for processing through a separate driver model. Yet, the communication due to API forwarding can easily become the performance bottleneck. This study proposes Wormhole, an accelerator virtualization framework based on the C/S architecture that supports rapid delegated execution across virtual machines. It aims to provide upper-level users with an efficient and transparent way to accelerate accelerator virtualization with API forwarding while ensuring strong isolation between multiple users. By leveraging hardware virtualization feature, the framework minimizes performance degradation through exitless cross-VM control flow switch. Experimental results show that Wormhole’s prototype system can achieve up to 5 times performance improvement over the classic open-source virtualization solution such as GVirtuS in the training test of the classic model.
ZHANG Hong-Jun , WU Yan-Jun , ZHANG Heng , ZHANG Li-Bo
2020, 31(10):3038-3055. DOI: 10.13328/j.cnki.jos.006069 CSTR:
Abstract:Hash tables, as a type of data indexing structure that provides efficient data access based on key values, are widely used in various computer applications, especially in system software, databases, and high performance that require high performance Computing field. In network, cloud computing and IoT services, hash tables have become the core system components of cache systems. However, with the large-scale increase in the amount of large-scale data, performance bottlenecks have gradually emerged in systems designed with a multi-core CPU as the core of the hash table structure. There is an urgent need to further improve the high performance and scalability of the hash table. With the increasing popularity of general-purpose graphics processing units (GPUs) and the substantial improvement of hardware computing capabilities and concurrency performance, various types of system software tasks with parallel computing as the core have been optimized on the GPU and have achieved considerable performance promotion. Due to the sparseness and randomness, using the existing parallel structure of the hash table directly on the GPU will inevitably bring high-frequency memory access and frequent bus data transmission, which affects the performance of the hash table on the GPU. This study focuses on the analysis of memory access, hit rate, and index overhead of hash table indexes in the cache system. A hybrid access cache index framework CCHT (cache cuckoo hash table) adapted to GPU is proposed and provided. The cache strategy required by index and index overhead allows concurrent execution of write and query operations, maximizing the use of the computing performance and concurrency characteristics of GPU hardware, reducing memory access and bus transferring overhead. Through GPU hardware implementation and experimental verification, CCHT has better performance than other cache indexing hash table while ensuring cache hit rate.
LIANG Guan-Yu , WU Yan-Jun , WU Jing-Zheng , ZHAO Chen
2020, 31(10):3056-3073. DOI: 10.13328/j.cnki.jos.006070 CSTR:
Abstract:Software reliability is one of the research hotspots in the field of software engineering, and failure rate analysis is a typical research method for software reliability. However, the software construction mode has evolved from a single mode to a large-scale collaborative model represented by open source software. As one of the representative products, the operating system included open source software connected through combining relationships and dependencies has formed a supply network of tens of thousands of nodes. Typical methods lack consideration of supply relationships and cannot accurately identify and evaluate the software reliability issues introduced as a result. This article extends the concept of supply chain to the field of open source software, proposes a knowledge-based management method for software supply reliability in collaborative model: design the ontological body for the open source software ecosystem firstly, and then construct the knowledge graph of open source software to achieve the extraction, storage and management of knowledge; driven by knowledge, combined with traditional supply chain management methods, A set of reliability management methods for open source software supply chain is proposed, which constitutes a set of open source software supply chain management system. Taking the construction of a Linux operating system distribution as an example in experiment, it demonstrates how the open source software supply chain will support the reliability of the operating system. Results show that the open source software supply chain will help to clarify and evaluate the reliability risk of large complex system software.
LOU Wen-Qi , WANG Chao , GONG Lei , ZHOU Xue-Hai
2020, 31(10):3074-3086. DOI: 10.13328/j.cnki.jos.006071 CSTR:
Abstract:In recent years, due to the high-accuracy performance of Convolutional Neural Network (CNN) in character recognition and image classification, it has received widespread attention in the field of machine learning. Nevertheless, the compute-intensive and memory-intensive characteristics of CNN have posed huge challenges to the general-purpose processor, which needs to support various workloads. Therefore, a large number of CNN-specific hardware accelerators have emerged to improve efficiency. Whereas, although previous accelerators are significantly efficient, they usually lack flexibility. In this study, classical CNN models are analyzed and a domain-specific instruction set of 10 matrix instructions, called RV-CNN, is design based on the promising RISC-V architecture. By abstracting CNN computation into instructions, the proposed design can provide sufficient flexibility for CNN and possesses a higher code density than the general ISA. Based on this, a code-to-instruction mapping mechanism is proposed. By using the RV-CNN to build different CNN models on the Xilinx ZC702, it was found that compared to x86 processors, RV-CNN has an average of 141 times energy efficiency and 8.91 times the code density; compared to GPU, it has an average of 1.25 times energy efficiency and 1.95 times the code density. Besides, compared to previous CNN accelerators, the design supports typical CNN models while having good energy efficiency.
LIU Yan-Qiang , QI Zheng-Wei , GUAN Hai-Bing
2020, 31(10):3087-3099. DOI: 10.13328/j.cnki.jos.006065 CSTR:
Abstract:Field-programmable gate arrays (FPGAs) in heterogeneous computing have been attracting more and more attention due to its customizability and reconfigurability. Development of acceleration systems based on FPGAs involves the cooperation of both hardware and software developers. Building the systems by integrating software part and hardware part that are developed by independent tool chains introduces steep learning curve and difficulties in testing and deployment, thus preventing rapid prototyping. It has been a long academic history on how to make hardware design benefit from the progress in software engineering and software programming languages. This article will first present a survey on the design of development tools for hardware or hardware acceleration systems, and then will show the work of authors. Finally, a conclusion is drawn and the future prospect is discussed.
WANG Kang-Jin , JIA Tong , LI Ying
2020, 31(10):3100-3119. DOI: 10.13328/j.cnki.jos.006066 CSTR:
Abstract:Data center is not only an important IT infrastructure, but also a key support for enterprise Internet application. However, the resource utilization of data center is pretty low (only 10%~20%), which leads to a large amount of waste of resources, brings a huge extra operation and maintenance cost, and becomes a key problem restricting enterprises to improve the computing efficiency. By colocating online services and offline tasks, colocation can effectively improve the resource utilization rate of data center, which has become a research hotspot in academia and industry. This paper analyzes the characteristics of online services and offline tasks, and discusses the technical challenges faced by the performance interference between services and jobs. This paper summarizes the key technologies from the aspects of performance interference model, job scheduling, resource isolation and dynamic resource allocation, and discusses the application and effect of colocation systems in the industry with four typical colocation system. At the end of this paper, the future research direction is presented.
ZHANG Qian-Ying , ZHAO Shi-Jun
2020, 31(10):3120-3146. DOI: 10.13328/j.cnki.jos.006067 CSTR:
Abstract:Computing devices are processing and storing more and more sensitive information, such as passwords and personal fingerprints, so higher security requirements are required for them. With the development of physical attacks, a new kind of attack called board level physical attacks is developed, and this kind of attack can obtain secrets in the operating system by attacking hardware components at the printed circuit board (PCB) level. This newly proposed attack only uses simple tools, its cost is inexpensive, and it can be streamlined simply, so it can be leveraged by attackers to form new underground industry easily. Therefore it is a new security threat and challenge for operating systems. A common defense against this kind of attack is to extend a specialized memory encryption engine to the CPU, but most current computing devices are not equipped with such hardware security mechanisms. Thus, the academic fields and industrial fields propose software-based techniques to defend board level physical attacks, and these techniques have been becoming a research hotspot in recent years. This paper deeply analyzes the development of these techniques, summarizes their advantages and disadvantages, and discusses their development trends. First, the paper introduces the definition, threat model and some real-world attack cases of the board level physical attacks. Second, the paper describes the building blocks relied by the software-based techniques to defense the board level physical attacks. Third, the paper makes a survey of and categorizes the related work on the software-based defense technology according to their protection domains. At last, the paper analyzes the advantages and disadvantages of the technology, gives suggestions on how to implement it in practice, and discusses some development trends of this technology.
MO Qi , DAI Fei , DA Jian , ZHU Rui , XIE Zhong-Wen , LI Tong
2020, 31(10):3147-3166. DOI: 10.13328/j.cnki.jos.005809 CSTR:
Abstract:There are usually inconsistencies in collaborative business processes established by the bottom-up modeling method, so the correctness analysis is an important means to ensure its correct implementation. Most of the existing methods focus on correctness detection, which makes the analysis process of correctness of collaborative business processes complicated and time consuming. The correctness repairing method can avoid the duplicate detection and adjustment existing in the correctness detection method. However, this method is less researched and cannot be effectively applied to the repair of collaborative business processes. To this end, a method of repairing the correctness of collaborative business processes is proposed based on the complete route. First, the behaviors of partial correct collaborative business processes are abstracted into complete simple routes under the consideration of active synchronization and asynchronous interaction, and merged them into a core. Then, the coordination mapping is used to map the core to repaired business processes, and the repaired collaborative business process is established by combining all the repaired business processes concurrently. The repaired collaborative business process conforms to the actual characteristics of collaborative business processes, and contains all complete traces in the pre-repair collaborative business process, and no hidden traces are also introduced, thereby avoiding validation. Finally, experiments are used to compare the proposed method with the existing methods. The results show that compared with the existing work, under the consideration of the actual characteristics of collaborative business processes, the proposed approach can more effectively repair collaborative business processes.
CHEN De-Yan , ZHAO Hong , ZHANG Xia
2020, 31(10):3167-3183. DOI: 10.13328/j.cnki.jos.005825 CSTR:
Abstract:The health care domain is a knowledge-intensive domain. The quality of clinical diagnosis depends mainly on the knowledge of health care and clinical experience held by doctors. However, the ability of a single doctor is very limited, so the quality of clinical diagnosis is not high. To this end, this study proposes an aided diagnosis method based on the domain semantic knowledge base. Firstly, based on the knowledge of the medicine subject matter domain in Freebase, a domain semantic knowledge base is established. Then, based on the semantic knowledge base, the algorithms for calculating the weights of the symptoms in the knowledge base, the relevancy of the diseases related to the input symptom set from a patient, and the related symptom set related to the input symptom set from the patient are proposed. Finally, based on the clinical data of 6 kinds of common diseases randomly selected, the method proposed in this study is compared with the existing methods. On the one hand, the evaluation results show that the method of this paper improves the problems and deficiencies of the existing methods. On the other hand, it shows that the method can avoid the “cold start” problem and can quickly support the aided diagnosis of a large number of common diseases. Using the method presented in this paper, it is expected to provide a comprehensive diagnosis service for a large number of common diseases for the general practitioners at the grassroots level, or provide patients with self-diagnosis services for diseases.
ZHAO Yu-Wen , AO Yu-Long , YANG Chao , LIU Fang-Fang , YIN Wan-Wang , LIN Rong-Fen
2020, 31(10):3184-3196. DOI: 10.13328/j.cnki.jos.005848 CSTR:
Abstract:A two-layer decomposition 1-D FFT multi-core parallel algorithm is proposed according to the characteristics of Sunway 26010 processor. It is based on the iterative Stockholm FFT framework and the Cooley-Tukey FFT algorithm. It decomposes large scale FFT into a series of small scale FFTs. It improves the performance of the algorithm by means of designing reasonable task partitioning, register communication, double-buffering, and SIMD vectorization. Finally, the performance of the two-layer decomposition 1-D FFT multi-core parallel algorithm is tested. It achieves an average speedup of 44.53x, with a maximum speedup of up to 56.33x, and a maximum bandwidth utilization of 83.45%, compared to FFTW3.3.4 library running on the single MPE.
PAN Xiao , YU Qi-Di , MA Ang , SUN Ya-Xin , WU Lei , GUO Jing-Feng
2020, 31(10):3197-3215. DOI: 10.13328/j.cnki.jos.005810 CSTR:
Abstract:In recent years, with the popularization of positioning system and mobile devices, the numbers of spatial-textual objects increase extraordinarily. Location-based services using geographical information play a critical role in daily lives. Spatial keyword search has attracted more attention from academia and industry. However, many of the existing techniques can only be appliable on AND semantics. There is relatively less research supporting OR semantics. When the users do not require the exact keyword matching, the search technology that supports OR semantic is particularly important. To solve this problem, this study proposes a virtual grid-based query algorithm VGrid. VGrid is an aggregate linear quadtree (AIL) based algorithm, utilizing the easy transformation between the Morton codes and the spatial locations in the space. The algorithm can support both OR and AND semantics. Finally, a series of experiments is conducted on a real dataset, and the effectiveness and efficiency of the proposed algorithm are verified.
ZHANG Wei , WANG Zhi-Hai , YUAN Ji-Dong , HAO Shi-Lei
2020, 31(10):3216-3237. DOI: 10.13328/j.cnki.jos.005852 CSTR:
Abstract:Time series data are widely generated in many fields of science, technology and economy. Time series feature generation algorithm based on Symbolic Fourier Approximation (SFA) and sliding window transformation mechanism is one of the most effective feature dictionary construction algorithms, but there are some obvious shortcomings in this kind of methods. Firstly, the number of optimal Fourier values cannot be dynamically selected for different sliding window lengths in the process of transformation. Secondly, there is a lack of effective algorithm to select discriminant features from the generated massive features. To this end, a new variable length feature dictionary building algorithm is proposed in this study. First, a variable length word extraction method based on SFA is proposed. The method dynamically selects the optimal number of Fourier values for different sliding window lengths. Second, a new feature discriminant evaluation indicator is designed, and the generated features are selected according to its dynamic threshold. Experimental results show that, based on the proposed time series dictionary, the logistic regression model can achieve high classification accuracy and find the discriminant features in the prediction process.
ZHANG Qi-Hui , HU Xue-Xian , LIU Wen-Fen , WEI Jiang-Hong
2020, 31(10):3238-3250. DOI: 10.13328/j.cnki.jos.005805 CSTR:
Abstract:With the aid of three-party password-authenticated key exchange (3PAKE) protocol, two users, each of which shares a low-entropy password with the trusted server, could agree on a common session key securely. Since 3PAKE protocols reduce the burden of password management dramatically when the total number of users is very large, they have attracted much attention recently. However, most of the existing 3PAKE protocols are designed in the scenario where a user stores her/his plain password in the password file of the server, henceforth no protection would be provided once the password file is leaked. This study investigates the analysis and design of verifier-based 3PAKE protocols, where the server holds a verifier of a password other than the plain password. Firstly, it is shown that a recently proposed verifier-based 3PAKE protocol is not secure, which is vulnerable to off-line dictionary attack. Then, aiming to overcome the existed deficits, a new verifier-based 3APKE protocol is proposed and its security is proved in the standard model. Comparisons show that the proposed new scheme takes the advantage of security as well as enjoys practical efficiency.
2020, 31(10):3251-3265. DOI: 10.13328/j.cnki.jos.005803 CSTR:
Abstract:This paper gives a comprehensive overview of the popular animation simulation algorithms based on physical and data-driven methods as well as their applications in recent years. Firstly, the traditional physically-based acceleration fluid simulation methods are summarized, covering both their advantages and disadvantages. Then, the existing data-driven algorithms applied in fluid simulation are summarized and analyzed. In particular, the existing data-driven methods are sumed up into three types, namely, interpolation methods, methods based on pre-computed data, and deep learning methods. Further, some key points are given about the data-driven methods as well as the research trends and directions.
JIN Yao , SONG Dan , YU Cheng-Hai , MA Wen-Juan , SONG Ying , HE Li-Li
2020, 31(10):3266-3279. DOI: 10.13328/j.cnki.jos.005804 CSTR:
Abstract:Existing work of designing curves on mesh surface suffers from issues such as weak robustness, slow convergence, and narrow application ranges. To address these issues, a distance constrained approach is proposed, which converts the complicated manifold constraint into distance constraint, and formulates the problem as a constrained optimization combining with smoothness and interpolation (approximation) constraints. To solve the optimization, the curve is discretized into a poly-line, and the distance constraint is relaxed to point-to-plane distance by approximating the local surface patch with tangent plane. Since the curve points and the corresponding tangent points involved in the distance calculation are interdependence, a “local/global” alternating iteration scheme is adopted and the idea of Gauss-Newton method is used to control the convergence behavior. In the global stage, the iterative step is solved by relaxingthe problem into a convex optimization via distance approximation. In the local stage, a robust and efficient projection method is applied to update tangent planes. Finally, each segment of the poly-line is projected onto the surface by cutting planes. Experiments exhibit that the proposed method outperforms existing work on various aspects, including effectiveness, robustness, controllability, and practicability.
SUN Xiao-Peng , HE Xin , WANG Zhen-Yan , LI Jiao-Jiao , CHEN Teng , DONG Yu
2020, 31(10):3280-3294. DOI: 10.13328/j.cnki.jos.006060 CSTR:
Abstract:A novel algorithm is proposed for local anisotropic contraction deformation on thin shell using the framework of position-based dynamics. Firstly, a new elastic deformation energy of thin shell is presented to address the material limitation of position-based dynamics, and get desired elastic contraction deformation on a variety of materials. Secondly, a stable contraction deformation is abstained without jittering by giving a proper coefficient of bending energy. Thirdly, a local anisotropic ARAP deformation energy is defined to produce a rapid and stable invagination on the area of the local spherical structures where the deformation is slow and slight. Finally, the axis-aligned bounding box and the non-penetration filters are used as a preprocess stage in order to cull the primitive pairs that are impossible to collide, to accelerate the speed of collision detection. The experimental results demonstrate that, the proposed method supports many different types of materials and local anisotropic energies, and can work with the problems of jittering and the slight deformation on local spherical structures.
WANG Xiao-Dong , ZHAO Yi-Ning , XIAO Hai-Li , CHI Xue-Bin , WANG Xiao-Ning
2020, 31(10):3295-3308. DOI: 10.13328/j.cnki.jos.005800 CSTR:
Abstract:With the increasing number of logs produced by nodes in CNGrid, traditional manual methods for abnormal log analysis can no longer meet the need of daily analysis. This study proposed a method to define the abnormal log traffic pattern: The orderly arrangement of log types in the same node and at the same time slice represents a log traffic pattern. Based on this method, a log traffic pattern detection method was implemented, which was applied in automatically mine of abnormal log traffic pattern. The method starts with system log and classifies automatically according to the text similarity of log content. Then, the frequency of each types of log in the same time slice is taken as the input feature, and the anomaly detection method based on principal component analysis (PCA) is used to detect the abnormal input, and a large number of abnormal log type sequences are obtained. A distance metric based on the longest common subsequence is used to cluster these sequences by hierarchical clustering method. The clustering results are used with the adaptive K-itemset algorithm to get the deputies of the abnormal log flow modes. The above method was used to analyze the logs generated in the national high performance computing environment CNGrid in half a year according to different time periods (morning, night, midnight), and has obtained the abnormal log traffic patterns and their relationships in different time periods. The method can also be extended to the system logs of other distributed systems.
GAO Long , DAI Hua-Dong , YANG Sha-Zhou , DING Yan
2020, 31(10):3309-3320. DOI: 10.13328/j.cnki.jos.005814 CSTR:
Abstract:Xorg server is running in single-threaded mode on frame buffer device, which is hard to obtain good performance on multi-core CPU. For frame buffer device on multi-core CPU, a task queue is designed with mutual-inclusion, screen is split into several sub-screens, and each sub-screen is attached with a thread to draw rectangles within that sub-screen. A private task queue for each thread is used to hold their own tasks to draw rectangles, and load balance is kept between the main thread and each sub-thread. Results of x11perf show that rectangles filling operation could reach a speed-up ratio of 2.06 on a 4-core DELL desktop computer.