YANG Wen-Hua , ZHOU Yu , HUANG Zhi-Qiu
2021, 32(4):889-903. DOI: 10.13328/j.cnki.jos.006222 CSTR:
Abstract:Cyber-physics systems (CPS) are widely used in many key areas, such as industrial control and intelligent manufacturing. As a system deployed in these key areas, its quality is vital. However, due to the complexity of CPS and uncertainty in the system (such as the unpredictable sensing error of sensors used in the system), the quality assurance of CPS faces huge challenges. Verification is one of the effective ways to ensure the quality of the system. Based on the system model and specifications, verification can prove whether the system satisfies the required properties. Significant progress has also been made in the verification of CPS. For example, model checking technology has been used in existing works to verify whether the system's behavior under the influence of uncertainty satisfies the specification, and if not satisfied a counterexample will be given. An important input to these verification methods is the uncertainty model, which specifies the uncertainty in the system. In practice, it is not easy to accurately model the uncertainty in the system. Therefore, the uncertainty model used in the verification is likely to be inconsistent with the reality, which will lead to inaccurate verification results. To address this problem, this study proposes an uncertainty model calibration method based on counterexample validation to further improve the verification result accuracy. First, it determines whether the uncertainty model used for verification is accurate by validating whether the counterexample can be triggered during the execution of the system. For inaccurate models, the genetic algorithm is used for calibration, and the fitness function of the genetic algorithm is constructed based on the results of the counterexample validation to guide the search. Finally, hypothesis testing is used to help decide whether to accept the calibrated models. Experimental results on representative cases demonstrate the effectiveness of the proposed uncertainty model calibration method.
ZONG Zhe , YANG Zhi-Bin , YUAN Sheng-Hao , ZHOU Yong , Jean-Paul BODELEIX , Mamoun FILALI
2021, 32(4):904-933. DOI: 10.13328/j.cnki.jos.006223 CSTR:
Abstract:Safety-critical systems have evolved to use heterogeneous components to implement complex requirements, each component may adopt different computation models or modeling languages. Therefore, it is necessary to use complex modeling approaches to design those systems. AADL, as a multi-paradigm modeling language for safety-critical system architecture, is a good choice to design safety-critical heterogeneous systems because of its rich expressibility and well scalability. This study proposes a bottom-up AADL-SDL co-modeling approach that integrates functionality modeled by SDL through the AADL architecture model and provides a multi-task code generation approach for multi-core platforms. Firstly, AADL property sets are extended to support the capability of modeling functionality. Secondly, a multi-task code generation approach is proposed to transform AADL-SDL models to Ada code. Finally, a prototype tool is implemented to support AADL-SDL co-modeling and multi-task Ada code generation. The effectiveness of the method proposed in this study is analyzed based on the guidance, navigation, and control system scenarios.
BIAN Han , CHEN Xiao-Hong , JIN Zhi , ZHANG Min
2021, 32(4):934-952. DOI: 10.13328/j.cnki.jos.006224 CSTR:
Abstract:User requirements are the fundamental driving force of smart services in Internet of Things (IoT). Today, many IoT frameworks such as IFTTT allow end users to use simple trigger-action programing (TAP) rules for programing. But these rules describe device scheduling instructions instead of user service requirements. Some IoT systems propose goal oriented requirement approaches to support service goal decomposition. But it is difficult to ensure the consistency of different services and completeness of service deployment. In order to achieve correct “user programming” in IoT systems and ensure consistency and completeness of user requirements, this study proposes an environment modeling based automatic approach to generate TAP rules. Based on the service requirements provided by users, required system behaviors are automatically extracted according to the environment model. After checking their consistency and completeness, TAP rules are generated, which realizes automatic generation from user service requirements to device scheduling instructions. The environment ontology of IoT application scenarios is constructed to model the environment, and the description method of service requirements is also defined. Finally, the accuracy, efficiency, performance of the approach and the time cost for building the environment ontology are evaluated with a smart home scenario. The results show that the accuracy, efficiency, and performance of this approach exceed the available threshold, and the time cost in building the environment ontology can be ignored when the number of requirements reaches a certain number.
CAI Ting , LIN Hui , CHEN Wu-Hui , ZHENG Zi-Bin , YU Yang
2021, 32(4):953-972. DOI: 10.13328/j.cnki.jos.006229 CSTR:
Abstract:In recent years, with a large number of devices that continuously join the IoT, data sharing as the main driver of the IoT market has become a research hotspot. However, the users are reluctant to participate in data sharing due to the security concerns and lacking of incentive mechanism in current IoT. In this context, blockchain is introduced into the data sharing of IoT to solve the trust problem of users and provide secure data storage. However, in the exploration of building a secure distributed data sharing system based on the blockchain, how to break the inherent performance bottleneck of blockchain is still a major challenge. For this reason, the efficient blockchain-based data sharing incentive scheme is studied for IoT, in which an efficient data incentive sharing framework based on blockchain is proposed, named ShareBC. Firstly, ShareBC uses sharding technology to build asynchronous consensus zones that can process data sharing transactions in parallel and deploy efficient consensus mechanisms on the cloud/edge servers and asynchronous consensus zones in sharding, thus improving the processing efficiency of data sharing transactions. Then, in order to encourage IoT users to participate in data sharing, a sharing incentive mechanism based on hierarchical data auction model implemented by smart contract is presented. The proposed mechanism can effectively solve the problem of multi-layer data allocation involved in IoT data sharing, and maximize the overall social welfare. Finally, the experimental results show that the proposed scheme is economically efficient, incentive-compatible, real-time, and scalability, and has low cost and good practicability.
ZHANG Meng-Han , DU De-Hui , ZHANG Ming-Zhuo , ZHANG Lei , WANG Yao , ZHOU Wen-Tao
2021, 32(4):973-987. DOI: 10.13328/j.cnki.jos.006226 CSTR:
Abstract:In the current autonomous driving scenario modeling and simulation field, spatio-temporal trajectory data-driven modeling and application of autonomous driving safety-critical scenario are key problems, which is significant to improve the security of the system. In recent years, great progress has been achieved in the modeling and application of spatio-temporal trajectory data, and the application of spatio-temporal trajectory data in specific fields has attracted wide attention. However, due to spatio-temporal trajectory data has diversity and complexity as well as massive, heterogeneous, dynamic characteristics, researches in the safety-critical field modeling still face challenges, including unified meta-data of spatio-temporal trajectory, meta-modeling method based on spatio-temporal trajectory data, data processing based on the data analysis of spatio-temporal trajectory, and data quality evaluation. In view of the scenario modeling requirements in the field of autonomous driving, a meta-modeling approach is proposed to construct spatio-temporal trajectory meta-data based on MOF meta-modeling system. According to the characteristics of spatio-temporal trajectory data and autonomous driving domain knowledge, a meta-model of spatio-temporal trajectory data is constructed. Then, the modeling approach of autonomous driving safety-critical scenarios is studied based on spatio-temporal trajectory data element modeling technology system, a scenario modeling language ADSML is used to automatic instantiation safety-critical scenarios, and a library of safety-critical scenarios is constructed, aiming to provide a feasible approach for the modeling of such safety-critical scenarios. Combined with the scenario of lane change and overtaking, the effectiveness of spatio-temporal trajectory data-driven autonomous driving safety-critical scenario meta-modeling approach is demonstrated, which lays a solid foundation for the construction, simulation, and analysis of the scene model.
GAO Feng-Juan , WANG Yu , SITU Ling-Yun , WANG Lin-Zhang
2021, 32(4):988-1005. DOI: 10.13328/j.cnki.jos.006225 CSTR:
Abstract:With the rapid development of software techniques, domain-driven software raises new challenges in software security and robustness. Symbolic execution and fuzzing have been rapidly developed in recent decades, demonstrating their ability in detecting software bugs. Enormous detected and fixed bugs demonstrate their feasibility. However, it is still a challenging task to combine the two methods due to their corresponding weakness. State-of-the-art techniques focus on incorporating the two methods such as using symbolic execution to solve paths when fuzzing gets stuck in complex paths. Unfortunately, such methods are inefficient because they have to switch to fuzzing (resp. symbolic execution) when conducting symbolic execution (resp. fuzzing). This paper presents a new deep learning-based hybrid testing method using symbolic execution and fuzzing. This method tries to predict paths that are suitable for fuzzing (resp. symbolic execution) and guide the fuzzing (resp. symbolic execution) to reach the paths. To further enhance the effectiveness, a hybrid mechanism is proposed to make them interact with each other. The proposed approach is evaluated on the programs in LAVA-M, and the results are compared with that using symbolic execution or fuzzing independently. The proposed method achieves more than 20% increase of branch coverage, 1 to 13 times increase of the path number, and uncover 929 more bugs.
CAO Ying-Kui , SUN Ze-Yu , ZOU Yan-Zhen , XIE Bing
2021, 32(4):1006-1022. DOI: 10.13328/j.cnki.jos.006227 CSTR:
Abstract:In software development, developers often need to change or update lots of similar codes. How to perform code transformation automatically has become a research hotspot in software engineering. An effective way is:Extracting the change pattern from a set of similar code changes and apply it to automatic code change transformation. In the related work, deep-learning-based approaches have achieved much progress, but they suffer from the problem of significant long-dependency among code. To address this challenge, an automatic code change transformation method is proposed, namely ExpTrans, enhanced by code structure information. Based on graph-based representations of code changes, ExpTrans is enhanced with structural information of code. ExpTrans labels the dependency among variables in code parsing, adopts the graph-convolution network and transformer structure, so as to capture the long-dependency among code. To evaluate ExpTrans's effectiveness, it is compared with existing learning-based approaches first, the results show that ExpTrans gains 11.8%~30.8% precision increment. Then, ExpTrans is compared with rule-based the approaches, the results show that ExpTrans significantly improves the correct rate of the modified instances.
SHEN Qi , QIAN Ying , ZOU Yan-Zhen , WU Shi-Jun , XIE Bing
2021, 32(4):1023-1038. DOI: 10.13328/j.cnki.jos.006228 CSTR:
Abstract:In the process of software reuse, users need concise and clear natural language description of software functions to understand the candidate software project quickly. However, current open source software often lacks high-quality documentation, which makes this process even more complex and difficult. This study proposes a novel functional feature mining approach combining code and documentation. It describes functional features in the form of verb phrases, automatically extracts functional features by iterately mining source code and software documents such as Stack Overflow, associates corresponding API usage example for each functional feature, and builds hierarchical functional feature view for uses finally. The experiments are set on several open source software and its related heterogeneous data, the results show that the functional features generated by the proposed approach cover 95.38% of the functions in official documentation, and the proposed approach achieves 93.78% and 92.57% accuracy for mining sentences and functional features respectively. Compared to two existing tools TaskNav and APITasks, the proposed approach improves the accuracy by 28.78% and 11.56% separately.
2021, 32(4):1039-1050. DOI: 10.13328/j.cnki.jos.006220 CSTR:
Abstract:Coverage-based fault localization is a common technique that identifies the executing program elements correlating with failure. However, the effectiveness of coverage-based fault localization suffers from the effect of coincidental correctness which occurs when a fault is executed but no failure is detected. Coincidental correctness is prevalent. In the previous work, a method is proposed to estimate the probability that coincidental correctness happens for each program execution using dynamic data-flow analysis and control-flow analysis. In this study, a new fault-localization approach is proposed based on the coincidental correctness probability. To evaluate the proposed approach, safety and precision are used as evaluation metrics. The experiment involved Siemens test suite from Software-artifact Infrastructure Repository (SIR) which is mostly used in related works. The results are compared with Tarantula and the fault-localization technique based on coincidental correctness probability. The results show that the proposed approach can improve the safety and precision of the fault-localization technique significantly.
ZHONG Wen-Kang , GE Ji-Dong , CHEN Xiang , LI Chuan-Yi , TANG Ze , LUO Bin
2021, 32(4):1051-1066. DOI: 10.13328/j.cnki.jos.006221 CSTR:
Abstract:Machine translation task focuses on converting one natural language into another. In recent years, neural machine translation models based on sequence-to-sequence models have achieved better performance than traditional statistical machine translation models on multiple language pairs, and have been used by many translation service providers. Although the practical application of commercial translation system shows that the neural machine translation model has great improvement, how to systematically evaluate its translation quality is still a challenging task. On the one hand, if the translation effect is evaluated based on the reference text, the acquisition cost of high-quality reference text is very high. On the other hand, compared with the statistical machine translation model, the neural machine translation model has more significant robustness problems. However, there are no relevant studies on the robustness of the neural machine translation model. This study proposes a multi-granularity test framework MGMT based on metamorphic testing, which can evaluate the robustness of neural machine translation systems without reference translations. The testing framework first replaces the source sentence on sentence-granularity, phrase-granularity, and word-granularity respectively, then compares the translation results of the source sentence and the replaced sentences based on the constituency parse tree, and finally judges whether the result satisfies the metamorphic relationship. The experiments are conducted on multi-field Chinese-English translation datasets and six industrial neural machine translation systems are evaluated, and compared with same type of metamorphic testing and methods based on reference translations. The experimental results show that the proposed method MGMT is 80% and 20% higher than similar methods in terms of Pearson's correlation coefficient and Spearman's correlation coefficient respectively. This indicates that the non-reference translation evaluation method proposed in this study has a higher positive correlation with the reference translation based evaluation method, which verifies that MGMT's evaluation accuracy is significantly better than other methods of the same type.
YANG Yang , ZHAN De-Chuan , JIANG Yuan , XIONG Hui
2021, 32(4):1067-1081. DOI: 10.13328/j.cnki.jos.006167 CSTR:
Abstract:Recently, multi-modal learning is one of the important research fields of machine learning and data mining, and it has a wide range of practical applications, such as cross-media search, multi-language processing, auxiliary information click-through rate estimation, etc. Traditional multi-modal learning methods usually use the consistency or complementarity among modalities to design corresponding loss functions or regularization terms for joint training, thereby improving the single-modal and ensemble performance. However, in the open environment, affected by factors such as data missing and noise, multi-modal data is imbalanced, specifically manifested as insufficient or incomplete, resulting in “inconsistency modal feature representations” and “inconsistent modal alignment relationships”. Direct use of traditional multi-modal methods will even degrade single-modal and ensemble performance. To solve these problems, reliable multi-modal learning has been proposed and studied. This paper systematically summarizes and analyzes the progress made by domestic and international scholars on reliable multi-modal research, and the challenges that future research may face.
WANG Nai-Yu , YE Yu-Xin , LIU Lu , FENG Li-Zhou , BAO Tie , PENG Tao
2021, 32(4):1082-1115. DOI: 10.13328/j.cnki.jos.006169 CSTR:
Abstract:Language model, to express implicit knowledge of language, has been widely concerned as a basic problem of natural language processing in which the current research hotspot is the language model based on deep learning. Through pre-training and fine-tuning techniques, language models show their inherently power of representation, also improve the performance of downstream tasks greatly. Around the basic principles and different application directions, this study takes the neural probability language model and the pre-training language model as a pointcut for combining deep learning and natural language processing. The application as well as challenges of neural probability and pre-training model is introduced, which is based on the basic concepts and theories of language model. Then, the existing neural probability, pre-training language model include their methods are compared and analyzed. In addition, the training methods of pre-training language model are elaborated from two aspects of new training tasks and improved network structure. Meanwhile, the current research directions of pre-training model in scale compression, knowledge fusion, multi-modality, and cross-language are summarized and evaluated. Finally, the bottleneck of language model in natural language processing application is summed up, afterwards the possible future research priorities are prospected.
TAN Hong-Wei , WANG Guo-Dong , ZHOU Lin-Yong , ZHANG Zi-Li
2021, 32(4):1116-1128. DOI: 10.13328/j.cnki.jos.006156 CSTR:
Abstract:Generating high-quality samples is always one of the main challenges in generative adversarial networks (GANs) field. To this end, in this study, a GANs penalty algorithm is proposed, which leverages a constructed conditional entropy distance to penalize its generator. Under the condition of keeping the entropy invariant, the algorithm makes the generated distribution as close to the target distribution as possible and greatly improves the quality of the generated samples. In addition, to improve the training efficiency of GANs, the network structure of GANs is optimized and the initialization strategy of the two networks is changed. The experimental results on several datasets show that the penalty algorithm significantly improves the quality of generated samples. Especially, on the CIFAR10, STL10, and CelebA datasets, the best FID value is reduced from 16.19, 14.10, 4.65 to 14.02, 12.83, and 3.22, respectively.
ZHANG Zhou , JIN Pei-Quan , XIE Xi-Ke
2021, 32(4):1129-1150. DOI: 10.13328/j.cnki.jos.006168 CSTR:
Abstract:Index is one of the key technologies to improve the performance of database systems. In the era of big data, the traditional indexes, such as B+-Tree, have exposed some limitations. Firstly, they cost too much space. For example, B+-Tree requires an extra O(n) space, which is intolerable for big data environment. Secondly, they require multiple indirect searches per query. For example, each query in a B+-Tree requires access to all nodes from the root to the leaf, which limits the search performance of the B+-Tree to the data size. Since 2018, the combination of artificial intelligence and database has given birth to a new research direction called "learned index". Learned indexes use machine learning to learn data distribution and query load characteristics, and replace the traditional indirect index search with a direct search based on fitting functions, so as to reduce the space cost and improve the query performance. This survey firstly systematically sorts out and classifies the existing works of learned indexes. Then, the motivation and key techniques of each learned index are introduced, and the advantages and disadvantages of various index structures are compared and analyzed. Finally, the future research directions of learned indexes are prospected.
2021, 32(4):1151-1164. DOI: 10.13328/j.cnki.jos.006116 CSTR:
Abstract:It is well known that Shor's algorithm can solve the integer factorization problem and the discrete logarithm problem in polynomial time, which makes classical cryptosystems insecure. Hence, more and more post-quantum cryptosystems emerge at present such as lattice-based, code-based, hash-based, and isogeny-based cryptosystems. Compared with other cryptosystems, the isogeny-based cryptosystems have the advantages of short key size. Nevertheless, it does not outperform other cryptosystems in respect of implementation efficiency. Based on two types of key exchange protocols from supersingular elliptic curve isogeny, this paper analyzes the possibility of optimizing two key exchange protocols according to the classical optimizations of elliptic curve scalar multiplication and pairing as well as some characteristics of elliptic curve isogeny. Meanwhile, the paper categorizes and reviews the current progress on efficient isogenous computations, and puts forward the further researches in this direction.
WU Wei-Bin , LIU Zhe , YANG Hao , ZHANG Ji-Peng
2021, 32(4):1165-1185. DOI: 10.13328/j.cnki.jos.006165 CSTR:
Abstract:To solve the threat of quantum computing to the security of public-key cryptography, post-quantum cryptography has become a frontier focus in the field of cryptography. Post-quantum cryptography guarantees the security of the algorithm through mathematical theories, but it is vulnerable to side-channel attacks in specific implementation and applications, which will seriously threaten the security of post-quantum cryptography. This study is based on the round 2 candidates in the NIST post-quantum cryptography standardization process and the round 2 candidates in the CACR public key cryptography competition in China. First, classification investigations of various post-quantum cryptographic algorithms are conducted, including lattice-based, code-based, hash-based, and multivariate-based cryptographic algorithms. Then, their security status against side-channel attacks and existing protection strategies are analyzed. To analyze the methods of side-channel attack against post-quantum cryptography, it is summarized that the commonly used post-quantum cryptography side-channel attack methods, attack targets, and attack evaluation indexes for various post-quantum cryptography according to the classification of core operators and attack types. Furthermore, following the attack types and attack targets, the existing countermeasures for attack and the costs of defense strategies are sorted out. Finally, in the conclusion part, some security suggestions are put forward according to the attack method, protection means, and protection cost, and also the potential side-channel attack methods and defense strategies in the future are analyzed.
CHEN Kan-Song , LI Hao-Ke , RUAN Yu-Long , WANG Shi-Hui
2021, 32(4):1186-1200. DOI: 10.13328/j.cnki.jos.005970 CSTR:
Abstract:The performance of ad hoc networks is particularly influenced by the broadcast storms when the nodes are moving at a high speed in the network. Frequent change in the topology of the network can easily lead to interruption of the route. During searching the route, the broadcast forwarding RREQ (route request packet) mechanism is applied to the traditional AODV routing protocol, leading to broadcast storms. In addition, the traditional routing protocol cannot adapt to the high-speed movement of the nodes due to that the least hop is just used as the route selecting. In this study, an improved AODV routing protocol is proposed. Firstly, the data forwarding probability based on the number of local neighbors is calculated during the route initiation. Secondly, according to the cross-layer design, the link weight based on the speed of the nodes movement is used to select route. The simulation results with NS2 show that the improved routing protocol can be applied to the high-speed mobile networks due to higher delivery ratio of data packets and less end-to-end transmission delay.
CHEN Ke-Qi , ZHU Zhi-Liang , DENG Xiao-Ming , MA Cui-Xia , WANG Hong-An
2021, 32(4):1201-1227. DOI: 10.13328/j.cnki.jos.006166 CSTR:
Abstract:Object detection is a classic computer vision task which aims to detect multiple objects of certain classes within a given image by bounding-box-level localization. With the rapid development of neural network technology and the birth of R-CNN detector as a milestone, a series of deep-learning-based object detectors have been developed in recent years, showing the overwhelming speed and accuracy advantage against traditional algorithms. However, how to precisely detect objects in large scale variance, also known as the scale problem, still remains a great challenge even for the deep learning methods, while many scholars have made several contributions to it over the last few years. Although there are already dozens of surveys focusing on the summarization of deep-learning-based object detectors in several aspects including algorithm procedure, network structure, training and datasets, very few of them concentrate on the methods of multi-scale object detection. Therefore, this paper firstly review the foundation of the deep-learning-based detectors in two main streams, including the two-stage detectors like R-CNN and one-stage detectors like YOLO and SSD. Then, the effective approaches are discussed to address the scale problems including most commonly used image pyramids, in-network feature pyramids, etc. At last, the current situations of the multi-scale object detection are concluded and the future research directions are looked ahead.