2021, 32(9):2629-2641. DOI: 10.13328/j.cnki.jos.006049
Abstract:Generating random hard instances of the 3-CNF formula is an important factor in revealing the intractability of the 3-SAT problem and designing effective algorithms for satisfiability testing. Let k>2 and s>0 be integers, a k-CNF formula is a strictly regular (k,2s)-CNF one if the positive and negative occurrence number of every variable in the formula are s. On the basis of the strictly regular (k,2s)-CNF formula, the strictly d-regular (k,2s)-CNF formula is proposed in which the absolute value of the difference between positive and negative occurrence number of every variable is d. A novel model is constructed to generate the strictly d-regular random (k,2s)-CNF formula. The simulated experiments show that the strictly d-regular random (3,2s)-SAT problem has an SAT-UNSAT phase transition and a HARD-EASY phase transition when the parameter 5<s<11 is fixed, and that the latter is related to the former. Hence, the satisfiability threshold of the strictly d-regular random (3,2s)-SAT problem is studied when the parameter s is fixed. A lower bound of the satisfiability threshold is obtained by constructing a random experiment and using the first moment method. The subsequent simulated experiments verify well the lower bound proved.
JIA Zi-Jia , ZHONG Chen-Xing , ZHOU Shi-Qi , RONG Guo-Ping , ZHANG Cheng
2021, 32(9):2642-2664. DOI: 10.13328/j.cnki.jos.006275
Abstract:In recent years, domain driven design (DDD), as a software design method, has gradually become popular in the industry and formed several inherent paradigms of application, namely domain driven design pattern (DDDP). However, the software development community still lacks a comprehensive understanding of the role of DDDP in software projects. Objective:This study aims to reveal the application status of DDDP, including which DDDP is applied to software development, the benefits, challenges, and mitigation methods for challenges. Methods:In our study, a systematic literature review is performed to identify, screen, summarize and analyze the relevant literature published between 2003 and July 2019. Results:Through the combination of manual retrieval, automatic retrieval and snowballing, this paper covered 1 884 relevant literatures, and after screening, 26 high-quality literatures were finally obtained, corresponding to 26 independent studies. This study summarized theoverview of DDDP in the primarystudies, including the 11 benefits, 17 challenges and the mitigation methods of challenges for the DDDP which applied in software development. Conclusion:DDD can help practitioners design software better since its prominent emphasis on domain knowledge, but there are still some challenges when applying DDD patterns. While these mitigation methods may tacklethe challenges to a certain extent, there are also some deficiencies remained. This study fills in the knowledge gaps in this field through SLR. Considering the mismatch between the practical value of DDDP and the current theoretical maturity, the industry and academia should pay more attention to this field in the future.
JIANG Jia-Jun , CHEN Jun-Jie , XIONG Ying-Fei
2021, 32(9):2665-2690. DOI: 10.13328/j.cnki.jos.006274
Abstract:Program defects are inevitable during the development and maintenance processes. With the rapid increase of software scales, the number and repair complexity of program defects increase as well, which has caused huge economic loss to enterprises, and becomes the big burden for developers during maintaining. Automatic program repair (APR) techniques have the potential to release developers from heavy debugging tasks, and become a popular research topic recently. This study collected the most recent 94 high-quality publications in this research field. According to analyzing the approaches used for patch generation, APRs are systematically classified into four categories, i.e., search-based, template-based, constraint-based, and statistical-analysis-based APRs. Especially, this study proposed the category of statistical-analysis-based APR for the first time based on the most recent publications, which complements and improves existing taxonomy. Based on existing techniques, the key challenges and insights are summarized for future research. Finally, benchmarks and open-source APR tools are briefly summarized for reference.
2021, 32(9):2691-2712. DOI: 10.13328/j.cnki.jos.005984
Abstract:Software testing is an important part of software development, which can effectively improve software reliability and its quality. The reuse of test cases can improve the efficiency of testing and reduce the workload of software testing. Therefore, an approach to test case reuse between similar programs based on keyword flow graph is proposed. The method reuses test data generated by programs to their similar programs. Thus, the first step in exploring reuse of test cases is to determine similarities between programs. To determine the similarity of programs, a comparison method of similarity based on keyword flow graph is given. Firstly, the keywords in the program code are stored in the nodes corresponding to the flow graph to construct the keyword flow graph. Then, the maximum common sub-graph of keyword flow graph of the maximum programs tested is searched by using dynamic programming algorithm. Finally, the similarity of the programs is computed according to the common sub-graph distance algorithm. Programs with a high similarity can be used with test case reuse methods. In utilizing genetic algorithm to generate test cases, the test cases with high fitness of similar programs are selected. To speed up the efficiency of test case generation, the population intersects with these test cases continuously in the process of evolution. Experiments show that the reuse of test cases in test generation of similar programs has obvious advantages over traditional methods in terms of coverage and average evolution times.
JIA Tong , LI Ying , ZHANG Qi-Xun , WU Zhong-Hai
2021, 32(9):2713-2728. DOI: 10.13328/j.cnki.jos.005990
Abstract:With the development of AIOps, log-based failure diagnosis has become more and more important. However, this technique has a key bottleneck-the quality of logs. Today, the lack of log printing specifications and guidance for programmers is a key factor of poor log quality, thus the need of automatic logging decision so as to improve log quality is becoming urgent. This study focuses on automatic logging decision. Specifically, the aim is to propose a general logging point decision approach. Different from existing works, an automatic feature vector generation method is proposed based on program layered structure tree and reverse composition, which can be applied to software systems written in different programming languages. In addition, this study leverages transfer learning algorithms to achieve cross-component and cross-project logging point decision. The approach is evaluated on five popular open source software systems, namely, OpenStack, Tensorflow, SaltCloud, Hadoop, and Angel, in three typical application scenarios including software upgrading, new component development, and new project development. Results show that the proposed approach performs about 95% accuracy in Java projects and 70% accuracy in Python projects on average.
CHEN Li-Zhe , WU Ji , YANG Hai-Yan , ZHANG Kui
2021, 32(9):2729-2743. DOI: 10.13328/j.cnki.jos.006075
Abstract:In each iteration of microservice system, regression testing should be executed. A large number of repeat testing will cause waste of resources. Therefore, it is necessary to minimize the test suite to reduce costs and to improve testing efficiency. Current test suite minimization technologies mainly rely on system specification and architecture description as input, which is limited to the practicability of microservice system with the characteristics of service autonomy and uncertain call relationship. Moreover, current test suite minimization technologies rarely take the usage scenarios into consideration, and the test suite is difficult to reflect user's concerns. This study proposes a test suite minimization technology based on API gateway access logs mining. This technology mines frequent paths from API gateway access logs which reflects the dynamic operation of microservice system. The relationship between frequent paths and test cases is established to construct search graph. Then, origin test suite is minimized with heuristic search of the graph. This paper explains the whole process of the technology. The experiments based on an integrated OA microservice system show that the scale of the test suite is reduced by more than 40%, and its defect detection ability is reduced by no more than 10%.
LI Zhuang , LIU Lei , ZHANG Tong-Bo , ZHOU Wen-Bo , Lü Shuai
2021, 32(9):2744-2754. DOI: 10.13328/j.cnki.jos.005974
Abstract:Extension rule reasoning method has been widely used in solving the classical satisfiability problem. Several reasoning methods based on extension rule have been proposed, which have been recognized around the world. These methods include the complete algorithms like NER, IMOMH_IER, PPSER, and local search-based incomplete algorithm like ERACC. All of them have sound performance. ERACC algorithm is the most efficient and powerful algorithm in the current extension rule solver. However, the serial algorithm ERACC still needs improvement on heuristics and preprocessing. Therefore, a parallel framework is designed and it is called PERACC algorithm. Based on the configuration checking of local search method, the algorithm decomposes the original maximum term space into several maximum term subspaces, from three stages of assigning initial values to variables, simplifying solution space, and heuristic, simplifying the original clause set, then processing each subspace in parallel. Experiments show that, compared with the original algorithm, the proposed algorithm not only can improve the efficiency of solution, but also can solve larger benchmark, which makes the extension rule method break through the limitation of formula size again.
XIAO Jin-Sheng , ZHOU Jing-Long , LEI Jun-Feng , LI Liang , DING Ling , DU Zhi-Yi
2021, 32(9):2755-2768. DOI: 10.13328/j.cnki.jos.005986
Abstract:This study designs a new generator network, a new discriminator network, and a new loss function for image scene conversion. First, the generator network uses a deep convolutional neural network with a skip connection structure, in which multi-skip connection is used to share the structure information of the image. For the discriminator network, it uses a multi-scale global convolutional network which can distinguish between real and generated images of different sizes. At the same time, the new loss function is a combination of four loss functions referring to other algorithms, including GAN loss, L1 loss, VGG loss, and feature matching loss. Moreover, the validity of the new loss function is demonstrated through experimental comparisons. The experimental results show that the proposed algorithm can achieve multi-image transformations, and the details of generated images are preserved completely, the generated image is more realistic, and the block effect is obviously eliminated.
CHEN Jia-Nan , LI Zhe , LI Zhan-Shan
2021, 32(9):2769-2782. DOI: 10.13328/j.cnki.jos.005989
Abstract:Parallel propagation is a research direction in the field of parallel constraint programming, and its research content is how to implement filtering algorithms on constraints in parallel. According to the serial propagation mode which enforces generalized arc consistency (GAC) on table constraint network, this study proposes a parallel propagation mode to enforce temporary generalized arc consistency (TGAC) on table constraint network. This mode is based on multi-core CPU and consists of parallel propagation algorithm and parallel filtering algorithm. After that, the reliability of the parallel propagation mode is proved, and through the analysis of the worst case time complexity of the parallel propagation mode, it is also demonstrated that the parallel propagation mode is faster than the serial propagation mode in instances of which the average filtering time is longer. Finally, the experimental results also confirm the above conclusion, and the parallel propagation mode achieves a speed-up ratio ranging from 1.4 to 3.4 on most series.
LI Wei-Jiang , QI Fang , YU Zheng-Tao
2021, 32(9):2783-2800. DOI: 10.13328/j.cnki.jos.005992
Abstract:The purpose of this study is for the problem that the existing language knowledge and emotion resources are not fully utilized in the emotion analysis tasks, as well as the problems in the sequence model:the model will decode the input text sequence into a specific length vector, if the length of the vector is set too short, the information of input text will be lost. A bidirectional LSTM sentiment classification method is proposed based on multi-channel features and self-attention (MFSA-BiLSTM). This method models the existing linguistic knowledge and sentiment resources in sentiment analysis tasks to form different feature channels, and uses self-attention mechanism to focus on sentiment information. MFSA-BiLSTM model can fully explore the relationship between sentiment target words and sentiment polar words in a sentence, and does not rely on a manually compiled sentiment lexicon. In addition, this study proposes the MFSA- BiLSTM-D model based on the MFSA-BiLSTM model for document-level text classification tasks. The model first obtains all sentence expressions of the document through training, and then gets the entire document representation. Finally, experimental verifications are conducted on five sentiment classification datasets. The results show that MFSA-BiLSTM and MFSA-BiLSTM-D are superior to other state-of-the-art text classification methods in terms of classification accuracy in most cases.
YANG Bo , ZHANG Yu-Xue-Qing , PENG Yi-Da , ZHANG Chun-Xu , HUANG Jing
2021, 32(9):2801-2815. DOI: 10.13328/j.cnki.jos.006418
Abstract:Many deep learning algorithms have achieved satisfactory results on many supervised learning tasks, but they rely on a large number of labeled samples, and the classifiers trained with specific categories can only classify these categories. Zero-shot learning wishes that the computer can reason like a human, it uses historical knowledge to infer the characteristics of new objects and has the ability to recognize novel categories without lots of samples. It is found that there are sparse matrix and "cold-start" phenomena in zero-shot learning task, these phenomena are also in the recommendation tasks. Inspired by the recommendation tasks, the zero-shot classification task is modeled as a matrix completion problem, hoping to learn from the collaborative filtering algorithms in the recommendation field, which regards the sparse labeled matrix as the product of the visual feature matrix and semantic feature matrix, and then classifies the novel samples. In order to make the semantic representation of each category more accurate, a semantic graph structure is constructed based on the semantic relations between categories and a graph neural network is applied on it for information transferring between known and novel categories. Traditional zero-shot learning and generalized zero-shot learning experiments are performed on three classic zero-shot learning data sets. The experimental results show that the collaborative filtering based zero-shot learning method proposed in this study can effectively improve the classification accuracy, and the training cost is relatively small.
WU Xin-Dong , LI Jiao , ZHOU Peng , BU Chen-Yang
2021, 32(9):2816-2836. DOI: 10.13328/j.cnki.jos.006010
Abstract:Genealogy data is a typical example for data fragmentation with massive, multiple, heterogeneous, and autonomous sources. Merging scattered genealogy data on the Internet into a comprehensive and accurate genealogy database through data fusion technologies, can be beneficial to knowledge mining and reasoning from genealogy data, and can provide users with implicit information such as surname origins, surname changes, and surname associations. Based on BigKE, a big data knowledge engineering model for big knowledge, this study proposes an FDF-HAO framework (fragmented data fusion with human intelligence, artificial intelligence, and organizational intelligence), describes the functionalities, key technologies, and problems to be solved of each layer in the framework, and verifies the validity of the data fusion framework by using genealogy data as an example. Finally, the challenges and opportunities of fragmented data fusion are also discussed.
SHANG Fang-Zhou , SUN Bing , LIU Guo-Qiang , LI Chao
2021, 32(9):2837-2848. DOI: 10.13328/j.cnki.jos.005972
Abstract:Integral cryptanalysis is an effective method of block cipher analysis, and the integral distinguisher is usually constructed using a zero-sum property of some positions in the ciphertext. Based on the theorem of higher-order differential attack, the order of plaintexts can be exploited, to determine if some positions of the ciphertext are balanced. Inspired by the conventional integral cryptanalysis, the influence of constant on the leading-coefficient of polynomial is considered and the construction of probability integral distinguisher as well as the attack method are proposed in this study. When applied to PUFFIN, a 7-round probability integral distinguisher is constructed and used to mount a 9-round attack, and this attack can recover 92-bit round key. The data/time complexity is 224.8 chosen plaintexts, and 235.48 9 round encryptions, and the space complexity is 220.
LU Si-Qi , ZHOU Si-Yuan , MAO Ying
2021, 32(9):2849-2866. DOI: 10.13328/j.cnki.jos.005973
Abstract:TLS protocol works between the transport layer and application layer in TCP/IP system. The safety of transport layer is effectively protected by providing a series of security services such as confidentiality, integrity, authentication server required, as well as optional client authentication. In order to reduce network latency, TLS1.3 protocol adds the support for 0-RTT data, through caching long-term public key of server by client, and the long-term public key is directly used to generate a session key to send part of application layer data in the first message. For three kinds of 0-RTT mode, this study uses Scyther tools for formal analysis to obtain two attack paths of the 0-RTT data in CK security model, and an optimized protocol is proposed based on the 1-RTT semi-static mode. Through security proof and formal analysis, it is proved that the protocol is resistant to KCI attacks and replay attack against 0-RTT data in CK security model.
ZHONG Ping , XU Ai-Kun , ZHANG Yi-Wen , LI Ya-Ting , ZHANG Yi-Ming , HUANG Jia-Wei , WANG Jian-Xin
2021, 32(9):2867-2886. DOI: 10.13328/j.cnki.jos.005975
Abstract:In wireless rechargeable sensor network (WRSN), how to efficiently collect data from sensor nodes and reduce the system energy cost is very challenging. However, most recent data collection works either cannot adapt to the large-scale rechargeable sensor network or do not take into account the sensors' energy recharging problem. They will lead to the decrease of network traffic and lifetime. Thus, aiming at the problem of data collection and network cost in WRSN, this study proposes to use the data collection vehicle (DCV) and wireless charging vehicle (WCV) to be responsible for data collection and wireless charging respectively. It can optimize data collection and ensure network continuity at the same time. Firstly, in order to improve the data collection and charging efficiency to divide the large network into several parts, this study proposes a network partition scheme based on the neighborhood similarity of sensor nodes and the distance between nodes. Then, to each part, an anchor selection scheme based on tradeoff between neighbor amount and residual energy within k hops is proposed. Next, a network cost optimization function is designed by analyzing the relationship between sensor energy consumption and network cost. The optimal sensor nodes sensing data rate and link rate are obtained by dual decomposition and sub-gradient the cost function. The results demonstrate the network can not only reduce the overall network cost but also reduce the amount of dead sensor nodes.
KANG Bu-Rong , ZHANG Lei , ZHANG Rui , MENG Xin-Yu , CHEN Tong
2021, 32(9):2887-2900. DOI: 10.13328/j.cnki.jos.005976
Abstract:So far, the security of the most of the cryptographic primitives depends on the high-quality and unpredictable randomness. In cryptography, the pseudorandom number generator (PRNG) is used to generate randomness. Thus, the security of the PRNG will directly impact the security of cryptographic algorithms. However, there have been some reports showing that many human factors can lead to the failure randomness generated by the PRNG which is referred to as the backdoored pseudorandom number generator (BPRNG). A good example of this BPRNG is the dual elliptic curves PRNG (Dual EC PRNG) which has been exposed to generate bad randomness. With the emerging of BPRNG, new challenges will be confronted with the study of cryptographic algorithms. Therefore, it is important to investigate the cryptographic primitives against the BPRNG. This study first reviews the research background of the cryptographic primitives against the BPRNG, and then summarizes the existing schemes in this field.
SHEN Jun , LIAO Xin , QIN Zheng , LIU Xu-Chong
2021, 32(9):2901-2915. DOI: 10.13328/j.cnki.jos.005980
Abstract:In recent years, the research of spatial steganalysis based on deep learning has achieved sound results under high embedding rate, but the detection performance under low embedding rate is still not ideal. Therefore, a convolutional neural network structure is proposed, which uses the SRM filter for preprocessing to obtain implicit noise residuals, adopts three convolution layers and designs the size of convolution kernel reasonably, and selects appropriate batch normalization operations and activation functions to improve the network performance. The experimental results show that compared with the existing methods, the proposed network can achieve better detection performance for WOW, S-UNIWARD, and HILL, three common adaptive steganographic algorithms in spatial domain, and significant improvement in detection performance at low embedding rates of 0.2 bpp, 0.1 bpp, and 0.05 bpp. A step-by-step transfer learning method is also designed to further improve the steganalysis effect under low embedding rate conditions.
WU Sen-Yan , LUO Xi , WANG Wei-Ping , QIN Yan
2021, 32(9):2916-2934. DOI: 10.13328/j.cnki.jos.005983
Abstract:With the popularity of Web applications, malicious webpages are increasingly harmful to users in the process of Web browsing. The malicious URL mentioned in this paper refers that the corresponding webpage contains malicious codes that are harmful to users. These malicious code exploits the vulnerabilities of browsers or plugins to attack users with download malware automatically. Based on the statistics and analysis of amounts of living malicious URL, and considering the anti-detection technologies being more used in malicious webpage such as the client environment detection and redirections, 25 features in three aspects are designed, namely, content of webpage, parameters of JavaScript function, and Web session flows. And a detection method-HADMW is proposed based on these 25 features and machine learning. The experimental results suggest that HADMW can achieve 96.2% accuracy and 94.6% recall rate, and it can detect malicious URL effectively. At the same time, compared with the detection results of open projects and security software, HADMW achieves better results.
2021, 32(9):2935-2944. DOI: 10.13328/j.cnki.jos.006019
Abstract:Blind signature is a special digital signature, which is widely used in various anonymity environments. At present, the security of most blind signature schemes is mainly based on the intractability of large integer factoring (LIF) or discrete logarithm (DL) problems. However, with the birth of practical quantum computers, the traditional public key cryptosystem will be unsecure; moreover, the quantum algorithms make it face severe challenges. Hence, it is of great value to construct blind signature schemes that can resist the quantum computing attacks. One of main candidates of post-quantum cryptosystems is multivariate public key cryptosystem (MPKC). On the basis of the theory of MPKC and blind signature, a post-quantum blind signature scheme is proposed based on MPKC. The proposed cryptographic scheme separates the public and private keys of the signature by using another nonlinear reversible transformation L:Fr→Fr, which reduces the linear relationship between the public and private keys. Accordingly, it improves the security of the blind signature scheme. Analysis shows that this cryptographic scheme has the blindness, unforgeability, and untraceability; in addition, it has the merits of low computational complexity and resisting quantum computing attacks.
2021, 32(9):2945-2962. DOI: 10.13328/j.cnki.jos.005978
Abstract:Sunway TaihuLight supercomputer is suitable for high-throughput computing systems, which tend to have memory access latency and network latency. There is a large class of problems namely time-to-solution, which requires high frequency iterations. The typical application of time-to-solution problems is molecular dynamics simulation. Computations in molecular dynamics simulation depend on the time. Therefore, the iterative computations are difficult to be parallelized. Time scale usually exceeds microsecond, which means that the number of steps is more than 1012. It is impossible to finish effective simulation in a limited time on long latency system. Therefore, the main performance bottleneck on long latency Sunway system is how to increase the iterative frequency. This study proposes a series of optimization strategies to improve the iterative frequency:(1) Reducing communication overhead and network competition costs through single-core communication combined with on-chip synchronization; (2) Optimizating the speed of synchronization between cores through waiting the shared memory variable and synchronizing the computing processing elements; (3) Reducing the data dependencies by changing the computation patterns; (4) Covering up the memory access latency by overlapping computation and communication; (5) Regulating the data structure to improve accessibility.
LIU Jia-Qi , ZHANG Ya-Wen , ZHANG Han-Wen , MENG Xu-Ying , ZHOU Ji-Hua , ZHANG Yu-Jun
2021, 32(9):2963-2976. DOI: 10.13328/j.cnki.jos.005981
Abstract:Information-centric networking (ICN) transforms the network communication mode from the current host-oriented mode to an information-oriented one. Ubiquitous in-network caching is one of the significant features of ICN, which can effectively alleviate server pressure, as well as decrease the user access latency by allowing any nodes in network to cache. However, due to the lack of distribution awareness of content popularity, there are still many problems with the state-of-the-art ICN caching schemes, such as low cache utilization and lack of reasonable planning of cache location. This study proposes a cache coordination scheme based on two-level cache (CSTC) to solve these problems. The content store (CS) of each node is divided into two parts:popularity perception and collaboration allocation. Different caching strategies are applied to cached content with different popularity. At the same time, combined with the popularity filtering and routing mechanism, this scheme reduces cache redundancy and optimizes cache location. Finally, simulation experiments based on real network topology show that CSTC has doubled the number of cached secondary popular content. The cache hit ratio has increased by nearly 50%, and the average round-trip hop count is superior to the existing on-path caching method in most cases.