ZHANG Jian , ZHOU Nai-Chun , LI Ming , LIU Jie , CHEN Jiang-Tao , XIANG Dong , JIN Tao
2022, 33(5):1529-1550. DOI: 10.13328/j.cnki.jos.006547 CSTR:
Abstract:Industrial computational fluid dynamics (CFD) software is a kind of computer-aided engineering (CAE) software, which has a wide range of applications in aeronautics and astronautics and other fields. Its development process strongly relies on fluid mechanics, mathematics, computers, and other disciplines knowledge models, involving a large number of professional and fundamental scientific researches, such as theoretical derivations, physical model establishment, algorithm optimization, verification and validation, leading to a very complex composition of the software system and huge research and development difficulty. By introducing software engineering methods and practices, software development can be effectively organized and managed to shorten development and to improve the quality of the software. This study briefly analyzes the characteristics and new trends of industrial CFD software. Base on this, a combination of incremental and iterative developing model suitable for industrial CFD software is proposed. An automated continuous integration platform for CFD simulation software is developed. Suggestions for industrial CFD software design are given from the aspects of software interaction, encapsulation and efficiency, functional scalability, and high-performance cluster environment deployment. Targeted verification and validation methods suitable for scientific computing software are established. Finally, a demonstration case of domestic independent CFD software is illustrated, with a view to providing references for related researchers and practitioners.
ZHANG Yang , DONG Chun-Hao , LIU Hui , GE Chu-Yan
2022, 33(5):1551-1568. DOI: 10.13328/j.cnki.jos.006548 CSTR:
Abstract:Most of the existing code smell detection approaches rely on code structure information and heuristic rules, while pay little attention to the semantic information embedded in different levels of code, and the accuracy of code smell detection approaches is not high. To solve this problem, this study proposes a novel approach DeepSmell based on a pre-trained model and multi-level metrics. Firstly, the static analysis tool is used to extract code smell instances and multi-level code metric information in the source program and mark these instances. Secondly, the level information that relate to code smells in the source code are parsed and obtained through the abstract syntax tree. The textual information composed of the level information is combined with code metric information to generate the data set. Finally, text information is converted into word vectors using the BERT pre-training model. The GRU-LSTM model is applied to obtain the potential semantic relationship among the identifiers, and the CNN model is combined with attention mechanism to code smell detection. The experiment tested four kinds of code smells including feature envy, long method, data class, and god class on 24 open source programs such as JUnit, Xalan, and SPECjbb2005. The results show that DeepSmell improves the average recall and F1 by 9.3% and 10.44% respectively compared with existing detection methods, and maintains a high level of precision at the same time.
YUAN Tian-Hao , JI Shun-Hui , ZHANG Peng-Cheng , CAI Han-Bo , DAI Qi-Yin , YE Shi-Jun , REN Bin
2022, 33(5):1569-1586. DOI: 10.13328/j.cnki.jos.006549 CSTR:
Abstract:With the maturity of deep learning technology, intelligent speech recognition software has been widely used. Various deep neural networks in the intelligent software play a crucial role. Recent studies have shown that minor disturbances in adversarial examples significantly threaten the security and robustness of deep neural networks. Researchers usually take the generated adversarial examples as the test cases and input them into the intelligent speech recognition software to test whether the adversarial examples will make the software misjudge. And then defense methods are adopted to improve the security and robustness of intelligent software. For the adversarial example generation, black box intelligent speech software is more common in life and has practical research value. However, the existing generation methods have some limitations. Therefore, this study proposes a target adversarial example generation method for the black box speech software based on the firefly algorithm and gradient evaluation method, namely the firefly-gradient adversarial example generation method. With the set target text, disturbances are added to the original speech example. The firefly algorithm or gradient evaluation method is chosen to optimize the adversarial example according to the edit distance between the text of the current generated adversarial example and the target text so that the target adversarial example is generated finally. To verify the effectiveness of the method, this study conducts an experimental evaluation on common speech recognition software, using three different types of speech datasets: Common Speech dataset, Google Command dataset and LibriSpeech dataset, and looks for volunteers to evaluate the generated adversarial examples. Experimental results show that the proposed method can effectively improve the success rate of target adversarial example generation. For example, for the DeepSpeech speech recognition software, the success rate of generating adversarial examples on Common Speech datasets is 13% higher than that of the compared method.
YANG Hui-Wen , CUI Zhan-Qi , CHEN Xiang , JIA Ming-Hua , ZHENG Li-Wei , LIU Jian-Bin
2022, 33(5):1587-1611. DOI: 10.13328/j.cnki.jos.006550 CSTR:
Abstract:With the rise of blockchain technology, more and more researchers and companies pay attention to the security of smart contracts. Currently, there are some studies on smart contract defect detection and testing techniques. Software defect prediction technology is an effective supplement to the defect detection techniques, which can optimize the allocation of testing resources and improve the efficiency of software testing. However, there is no research on software defect prediction for the smart contract. To address this problem, this study proposes a defect prediction method for Solidity smart contracts. First, it designs a metrics suite (smart contract-Solidity, SC-Sol) which considers the variables, functions, structures, and features of Solidity smart contracts, and SC-Sol is combined with the traditional metrics suite (code complexity and features of object-oriented program, COOP), which consider the object-oriented features, into COOP-SC-Sol metrics suite. Then, it extracts relevant metric meta-information from the Solidity code and performs defect detection to obtain the defects information to construct a Solidity smart contracts defect data set. On this basis, seven regression models and six classification models are applied to predict the defects of Solidity smart contracts to verify the performance differences of different metrics suites and different models for predicting the number and tendency of defects. Experimental results show that compared with the COOP, COOP-SC-Sol can improve the performance of the defect prediction model by 8% in terms of the F1-score. In addition, the problem of class imbalance in smart contract defect prediction is further studied. The result shows that the random under-sampling method can improve the performance of the defect prediction model by 9% in F1-score. In predicting the tendency of specific types of defects, the performance of the model is affected by the imbalance of data sets. Better performance is achieved in predicting the types of defects which the percentage of defect modules is greater than 10%.
LI Cong , JIANG Yan-Yan , XU Chang
2022, 33(5):1612-1634. DOI: 10.13328/j.cnki.jos.006551 CSTR:
Abstract:GUI event-based record and replay technologies for Android apps aim at automatically capturing and playing back the UI interactions between users and apps. Record and replay are challenging because it involves a cross-understanding of three different program semantics: application difference, version evolution, and device compatibility. This study models record and replay as a search problem, and analyzes this problem from a human perspective. Accordingly, this study proposes a general framework to demonstrate the key points in record and replay: the widget representation and recording technologies, the event semantic equivalence strategies, and the local search strategies. By summarizing and analyzing existing technologies from a new perspective that is suitable for the framework, this study has a better understanding of the advantages and disadvantages of existing technologies and proposes feasible future research directions.
YU Xu , HE Ya-Dong , DU Jun-Wei , WANG Zhao-Zhe , JIANG Feng , GONG Dun-Wei
2022, 33(5):1635-1651. DOI: 10.13328/j.cnki.jos.006553 CSTR:
Abstract:Existing developer recommendation algorithms extract explicit features of tasks and developers by mining the explicit information of tasks and developers, so as to recommend developers to specific tasks. However, since the description information in the explicit information is subjective and often imprecise, the performance of existing developer recommendation algorithms based on explicit features is not ideal. The crowdsourcing software development platforms not only have a lot of imprecise description information, but also contain objective and more accurate “task-developer” score information, which can effectively infer implicit features of tasks and developers. Considering that implicit features are supplements to explicit features, which will effectively alleviate the problem of imprecise description information, this study proposes a developer hybrid recommendation algorithm that combines explicit features and implicit features. First, the explicit features are fully extracted from the visible information of tasks and the developers on the platform, and the explicit features-oriented factorization machine (FM) recommendation model is proposed to learn the relationship between explicit features of tasks and developers and the corresponding ratings. Then, implicit features are inferred with the "task-developer" rating matrix, and the implicit features-oriented matrix factorization (MF) recommendation model is proposed. Finally, a multi-layer perceptron fusion algorithm is proposed to fuse the explicit features-oriented FM recommendation model and implicit features-oriented MF recommendation model. Further, for the cold-start problem, first, based on historical data, a multi-layer perceptron model is utilized to learn the mapping relationship between explicit features and implicit features. Then, for the cold-start tasks or the cold-start developers, the implicit features are obtained through their explicit features. Finally, the ratings are predicted based on the trained multi-layer perceptron fusion algorithm. The simulation experiment on the Topcoder software crowdsourcing platform shows that the proposed algorithm outperforms the comparison algorithms significantly in terms of four different evaluation metrics.
HU Jun , Lü Jia-Run , WANG Li-Song , KANG Jie-Xiang , WANG Hui , GAO Zhong-Jie
2022, 33(5):1652-1673. DOI: 10.13328/j.cnki.jos.006554 CSTR:
Abstract:While the function and complexity of modern civil aircraft airborne software are growing rapidly, those safety standards for airborne software (such as DO-178B/C, etc.) must be satisfied at the same time. It raises more challenge to analyze and verify the consistency and integrity of airborne software requirements on the early stage of system development. This study introduces a formal modeling and analysis tool platform (avionics requirement tools, ART) for airborne software natural language requirements, and carries out a case study of the requirements of cockpit display and control software subsystem (EICAS). Firstly, the semantics of a formal variable relationship model (VRM) is given, also the platform architecture and tool chain of ART are descripted. Then, a methodology of formal analysis of requirement consistency and integrity based on multi-paradigm is given. After that, some details of the case study of EICAS are shown including: how to make a pre-modeling process of initial natural language requirements and the automatic analysis process of requirement model, such as the preprocessing and standardization of original requirement items, automatic generation of VRM models and multi-paradigm based formal analysis, etc. Finally, some experiences of this case study are drawn.
DING Yan-Ru , ZHANG Yan-Mei , JIANG Shu-Juan , YUAN Guan , WANG Rong-Cun , QIAN Jun-Yan
2022, 33(5):1674-1698. DOI: 10.13328/j.cnki.jos.006555 CSTR:
Abstract:Integration testing is an indispensable step in the software testing process. In response to the problem of how to rationally sort the classes in the system in integration testing, researchers worldwide have proposed a variety of methods to generate class integration test orders. However, most of them didn’t take the stubbing complexity as the indicator, which is an important factor in evaluating the test cost. In order to solve this problem, this study proposes a method of generating a class integration test order based on reinforcement learning, using the overall stubbing complexity as the indicator to evaluate the test cost, and generating a class integration test order with the stubbing complexity as low as possible. First, the reinforcement learning task is defined, and the pursuit goal of the algorithm is set according to the task. Second, the static analysis of the program is performed and the stubbing complexity is calculated according to the results of the analysis. Then, the calculation of the stubbing complexity is integrated into the design of the reward function to provide information and basis for choosing the next action. Finally, the value function is fed back through the reward function, and the value function is set to ensure that the cumulative reward is maximized. When the agent completes the specified number of training times, the system will select the class integration test order that obtains the largest cumulative reward value for output, which costs the lowest stubbing complexity pursued. The experimental results show that the results obtained by this method are better than those obtained by other existing methods in terms of the overall stubbing complexity as the evaluation indicator.
JIANG Jing , MIAO Meng , ZHAO Li-Xian , ZHANG Li
2022, 33(5):1699-1710. DOI: 10.13328/j.cnki.jos.006556 CSTR:
Abstract:Stack Overflow is one of the most popular software question and answer communities, where users can post questions and receive answers from others. In order to ensure the quality of questions, the website needs to promptly discover and delete questions with low quality or not conforming to the community’s theme. Currently, Stack Overflow mainly relies on manual inspection to find questions that need to be deleted. However, this way usually hardly guarantees to discover and delete questions in time, and increases the burden of community administrators. In order to quickly find questions that need to be deleted, this study proposes a method to automatically predict question deletion, which is named MulPredictor. This method extracts the semantic content features, the semantic statistical features and the meta features of a question, and uses the random forest classifier to calculate the probability that it will be deleted. Experimental results showed that, compared with existing methods DelPredictor and NLPPredictor, MulPredictor increases the accuracy by 16.34% and 12.78% on balanced test set, and increases the accuracy by 12.38% and 14.14% on random test set. In addition, this study also analyzes important features in question deletion, and finds that the code segment, the question’s title, and the first paragraph of the question’s body have the most significant impacts on question deletion.
QIAN Ju , WANG Yin , CHENG Hao , WEI Zheng-Xian
2022, 33(5):1711-1735. DOI: 10.13328/j.cnki.jos.006557 CSTR:
Abstract:Modern integrated naval mission systems (NMS) built on data-distribution service (DDS) have special characteristics in development, structure, and application which, in combination, make their testing challenging. Model-based testing (MBT) is considered a promising technique for testing such systems. However, for NMS-like systems under test, due to their high complexity and cooperative ways of development, traditional MBT techniques requiring a complete model of the system internals are difficult to be used. This paper presents a scenario-based MBT approach for NMS-like systems. The approach builds scenario models to express the interaction scenarios in a DDS-based system from the external perspective. A scenario model uses an extended form of regular expression to model interaction sequences and uses basic data element restrictions (e.g., ranges and enumerations), constraints, and calculation functions to model interaction data. It can express the interaction processes in an abstract, convenient, and relatively comprehensive way. On the models, algorithms are proposed to generate directly executable test cases for testing. Experiments on a real NMS show that the approach can be used to test 21 kinds of common risky scenarios identified from historical failures reported during the development of a family of NMS. This indicates that the approach might be helpful for testing NMS-like DDS-based industrial systems.
TANG Ze , LI Chuan-Yi , GE Ji-Dong , LUO Bin
2022, 33(5):1736-1757. DOI: 10.13328/j.cnki.jos.006559 CSTR:
Abstract:In recent years, with the continuous expansion and deepening of the application of software technology in various industries and fields, as well as the development of software architecture, services computing, etc., the software industry has emerged with feature-rich and large-scale third-party APIs or Libraries. Software developers are increasingly relying on these APIs when implementing software functions. However, learning the usage of these APIs is very difficult and time-consuming. There are two main reasons: 1) missing or wrong documents; 2) few sample codes for API usage. Therefore, designing automatic API completion methods to help developers use the API correctly and quickly has great application value. However, most of the existing API automatic completion methods regard the code segments to be completed as plain text, ignore the impact of the object types of APIs. Therefore, this study explores the role of the object types in completing APIs. Besides, inspired by the object state diagram, an concrete API completion method is designed and implemented that uses the types of the objects as a novel feature. Specifically, the subsequence of the same object type is first extracted from the API call sequence and a deep learning model is used to encode the state of each object. Then, the objects’ states is used to generate a state representation of the entire method block. In order to evaluate the proposed method, comprehensive experiments are conducted on six popular java projects. The experimental results prove that the proposed API completion method achieves significantly higher predicting accuracy than the baseline approaches.
LENG Lin-Shan , LIU Shuang , TIAN Cheng-Lin , DOU Shu-Jie , WANG Zan , ZHANG Mei-Shan
2022, 33(5):1758-1773. DOI: 10.13328/j.cnki.jos.006560 CSTR:
Abstract:Code clone detection is an important task in the software engineering community, it is particularly difficult to detect type-IV code clone, which have similar semantics but large syntax gap. Deep learning-based approaches have achieved promising performances on the detection of type-IV code clone, yet at the high-cost of using manually-annotated code clone pairs for supervision. This study proposes two simple and effective pretraining strategies to enhance the representation learning of code clone detection model based on deep learning, aiming to alleviate the requirement of the large-scale training dataset in supervised learning models. First, token embeddings models are pretrained with ngram subword enhancement, which helps the clone detection model to better represent out-of-vocabulary (OOV) tokens. Second, the function name prediction is adopted as an auxiliary task to pretrain clone detection model parameters from token to code fragments. With the two enhancement strategies, a model with more accurate code representation capability can be achieved, which is then used as the code representation model in clone detection and trained on the clone detection task with supervised learning. The experiments on the standard benchmark dataset BigCloneBench (BCB) and OJClone are conducedt, finding that the final model with only a very small number of training instances (i.e., 100 clones and 100 non-clones for BCB, 200 clones and 200 non-clones for OJClone) can give comparable performance than existing methods with over six million training instances.
WANG Lu , HUO Qi-En , LI Qing-Shan , WANG Zhan , JIANG Yu-Xuan
2022, 33(5):1774-1799. DOI: 10.13328/j.cnki.jos.006561 CSTR:
Abstract:The command and control information system (command and control system) runs in a dynamically changing and complex environment with constantly changed mission requirements. A self-adaptation decision-making method is urgently needed to dynamically generate the optimal strategy for adjusting the system, so as to adapt to changes in the environment or missions and ensure the long-term stable operation. At present, as the command and control system itself and its operating environment continue to become more complex, self-adaptation decision-making methods need to have the online trade-off decision-making ability to deal with multiple unexpected changes, so as to avoid conflicting adjustment consequences or failure to respond to unknown situations in a timely manner. Nevertheless, the current command and control system mostly adopts self-adaptation decision-making methods based on prior knowledge and responding to single changes, which cannot fully meet this capability requirement. Therefore, this study proposes a self-adaptation decision-making method for the command and control system based on parallel search optimization. This method uses search-based software engineering ideas to model the self-adaptation decision-making problem as a search optimization problem, and uses the genetic particle swarm algorithm to achieve the goal of online weighing against multiple changes that occur at the same time. In addition, in order to solve the problems of search efficiency guarantee and strategy selection in the actual application of this method in the command and control system, this study uses parallel genetic algorithm and POST-optimization theory to parallelize the self-adaptation decision-making method and establish a strategy multi-index sorting method to ensure the practicality of the method.
XU Xiao , DING Shi-Fei , DING Ling
2022, 33(5):1800-1816. DOI: 10.13328/j.cnki.jos.006122 CSTR:
Abstract:Density peaks clustering (DPC) algorithm is an emerging algorithm in density-based clustering analysis which draws a decision-graph based on the calculation of local-density and relative-distance to obtain the cluster centers fast. DPC is known as only one input parameter without prior knowledge and no iteration. Since DPC was introduced in 2014, it has attracted great interests and developments in recent years. This survey first analyzes the theory of DPC and the satisfactory behaviors of DPC by comparing it with classical clustering algorithms. Secondly, DPC survey is described in terms of clustering accuracy and computational complexity, including local-density optimization, allocation-strategy optimization, multi-density peaks optimization, and computational complexity optimization, to provide a clear organization. The main representative algorithms of each category are presented simultaneously. Finally, it introduces the related application research of DPC in different fields. This overview offers a comprehensive analysis for the advantages and disadvantages of DPC, and gives a comprehensive description for the improvements and applications of DPC. It is also attempted to find out some further challenges to promote DPC research.
HAO Shi-Lei , WANG Zhi-Hai , LIU Hai-Yang
2022, 33(5):1817-1832. DOI: 10.13328/j.cnki.jos.006208 CSTR:
Abstract:Time series classification is an important task in time series data mining and has attracted significant attention in recent years. An important part of this problem is the similarity measurement between time series. Among many similarity measurement algorithms, dynamic time warping (DTW) is very effective, which has been widely used in many fields such as video, audio, handwriting recognition, and biological information processing. DTW is essentially a point-to-point matching algorithm under the boundary and time consistency constraints, which is able to provide the global optimal matching between two sequences. However, there is an obvious deficiency in this algorithm, that is, it does not necessarily achieve reasonable local matching between sequences. Specifically, the time points with completely different local structure information may be incorrectly matched by DTW algorithm. In order to solve this problem, an improved DTW algorithm based on local gradient and binary pattern (LGBDTW) is proposed. Although the proposed algorithm is essentially a dynamic time warping algorithm, it takes into account the local gradient and binary pattern values of sequence points to carry out similarity weighted measurement, effectively avoiding points matching with different local structures. In order to make a comprehensive comparison, the algorithm is adopted as the similarity measurement of the nearest neighbor classification algorithm, and tests it on multiple UCR time series datasets. Experimental results show that the proposed method can effectively improve the accuracy of time series classification. In addition, some examples are provided to verify the interpretability of the proposed algorithm.
SHA Zi-Han , SHU Hui , WU Cheng-Gang , XIONG Xiao-Bing , KANG Fei
2022, 33(5):1833-1848. DOI: 10.13328/j.cnki.jos.006197 CSTR:
Abstract:Control flow is an abstract expression of the program process, and it is of critical significance to obfuscate the control flow to effectively reinforce the code’s ability to resist reverse manners. This study proposes the idea of deep control flow: as for the loop structure, the callback function is utilized to construct an equivalent loop model, and the basic block in the program process is converted into inter-process function calling to counter reverse technology. This study comprehensively applies control flow analysis and data flow dependency analysis to establish a deep control flow obfuscation model based on callback function and gives proof of functional consistency. To further enhance obfuscation, the function calling fusion algorithm is designed and implemented pertinently to construct a more sophisticated function calling process. Finally, OpenSSL and SPECint-2000 benchmark suite is used as the test set to verify the feasibility and effectiveness of the proposed model.
YU Qing-Yang , BAI Xiao-Ying , LI Ming-Jie , LI Qi-Yuan , LIU Tao , LIU Ze-Yin , PEI Dan
2022, 33(5):1849-1864. DOI: 10.13328/j.cnki.jos.006209 CSTR:
Abstract:In a large microservice system, there usually exist many services with complex dependencies among them. A failure in one component may propagate widely and cause large-scale service anomalies. To ensure system quality, it is critical to effectively identify abnormalities and locate root causes. Invocation-chain analysis is a commonly used method for service performance modeling and anomaly detection. Existing techniques are mostly data-driven, facing many challenges of big data analysis such as diversified chain structures, a vast number of instances, and imbalanced datasets that many structures have only a small number of samples. In counter to the problems, the study proposes a model-based approach which builds high-level abstractions of method invocation models based on control-flow analysis. The instances of various invocation-chain structures are clustered into various method invocation models, which can greatly reduce the size of chain structures. Performance models are built for the method invocation models, and thresholds are defined based on the predicted execution time derived from the performance model. Outliers in the trace logs are thus identified as candidates of anomalies. Experiments were exercised on real industry logs from Baidu PhoenixNest Ads system. A one-day log with over 1.7 billion records was selected. The experiment results show that, compared with pure data-driven sequence analysis methods, the proposed model-based approach can greatly reduce the size of invocation-chain structures while effectively analyzing and detecting microservice performance anomalies and root causes.
WU Ke-Wei , GAO Tao , XIE Zhao , GUO Wen-Bin
2022, 33(5):1865-1879. DOI: 10.13328/j.cnki.jos.006202 CSTR:
Abstract:Action recognition is one crucial and very challenging task in computer vision. Most of the existing methods use the temporal structure of the whole video and ignore its temporal noise and ambiguity feature, which leads to failure in action recognition. To address this problem, a novel temporal graph model is proposed with Grenander inference, namely, TGM-GI. First, a 3D CNN+ LSTM module is constructed to learn deep features, in which 3D CNN extracts the dynamic feature of video clips and LSTM optimizes the time dependence between features of two clips. Second, a temporal graph model is constructed with these deep features which use the generator space of Grenander theory. The original temporal pattern is modified using two operations, in which combination operation can remove redundancy clips like slow motion and denoise operation can remove low-frequency clips like abnormal motion. Third, an incremental Viterbi algorithm is proposed for temporal pattern learning with Grenander inference, in which a Grenander measure is designed with both feature bond and semantic bond. Finally, the dynamic time warping is used to match the Grenander temporal pattern of test video with the Grenander temporal pattern of the training set and the label of the test video is predicted. The experimental results show that the proposed TGM-GI outperforms the state-of-the-art methods on two acknowledge databases. The TGM-GI is superior to the baseline method of 3D CNN-LSTM, and its accuracy improves 6.41% on the UCF101 dataset and 5.67% on the Olympic Sports dataset respectively.
2022, 33(5):1880-1892. DOI: 10.13328/j.cnki.jos.006320 CSTR:
Abstract:Online discussion has become a main way for people to communicate opinions. Besides posting statements, users are also encouraged to reply to existing posts, revealing support or disapproval of others' viewpoints. Identifying argumentative relations between these interactive texts can benefit modeling the dialogue structure, detecting public opinions, and supporting business, marketing, and government to make decisions. Existing studies detected argumentative relations by constructing overall semantic information or conditional semantic information, but the contextual relevance information between interactive texts was ignored. This work proposed a co-attention contextual relevance network (CCRnet). With the co-attention mechanism, the model captured bi-directional attention between the post and reply. Experimental results on the CreateDebate dataset show that he proposed model outperforms the state-of-the-art models. Furthermore, the visualization of the similarity matrix illustrates the effectiveness of the co-attention mechanism.
HOU Ze-Zhou , CHEN Shao-Zhen , REN Jiong-Jiong
2022, 33(5):1893-1906. DOI: 10.13328/j.cnki.jos.006195 CSTR:
Abstract:Differential cryptanalysis is an important method in the field of block cipher. The key point of differential cryptanalysis is to find a differential distinguisher with longer rounds or higher probability. Firstly, the method of generating data set is described which is used to train a differential distinguisher based on deep learning. At the same time, the differential distinguisher of two kinds of lightweight block cipher is trained, SIMON32 and SPECK32, based on convolutional neural networks (CNN) and residual neural network (ResNet). In addition, two differential distinguishers are compared and it is found that ResNet is good at differential distinguisher of SIMON32, CNN is good at SPECK32 when considering time and accuracy. Next, the influence of the number of convolution operations of the network model is studied on the accuracy of the neural distinguisher, and it is found that adding the number of convolution layers of the CNN and the number of residual blocks of the ResNet model will cause the accuracy decrease compared with original networks. Finally, some suggestions are given to select networks and parameters when constructing a differential distinguisher based on deep learning, i.e., the CNN with low convolutional layers and the ResNet with low residual blocks should be considered as the first choose.
DOU Jia-Wei , CHEN Ming-Yan , CHENG Wen
2022, 33(5):1907-1921. DOI: 10.13328/j.cnki.jos.006206 CSTR:
Abstract:With the rapid development of the information technology, it becomes more and more popular that multiparty performs cooperative computation on their private data while preserving their privacy. Secure multiparty computation is a key privacy-preserving technology to address such security issues. The secure vector computation is an active area of secure multiparty computation. At present, there are many researches into secure vector computation such as private scalar product and private vector summation. There are few researches on securely computing the number of equal components of private vectors. These researches focus on secure two-party computation that all the components of vectors are drawn from a restricted range. This study focuses on privately computing the number of equal component of vectors and determining the relationship between the number and a threshold value. To this end, a component-matrix encoding is firstly proposed to encode a component of a vector. Then based on the ElGamal cryptosystem, a simple and efficient secure multiparty protocol is designed to compute number of equal components of vectors. Based on this protocol, an efficient secure multiparty protocol is designed to determine whether the number of equal components of vectors is larger than a threshold. The protocols do not restrict the data range of components. The correctness of the protocols is analyzed and it is proved that they are secure in the semi-honest model. Theoretical efficiency analysis and experimental result show that these protocols are simple and efficient. Finally, these protocols are used as building block to solve some practical secure multiparty computation problems.
ZOU Yao-Bin , ZHANG Jin-Yu , ZANG Zhao-Xiang , XIA Ping , WANG Jun-Ying , GONG Guo-Qiang , SUN Shui-Fa
2022, 33(5):1922-1946. DOI: 10.13328/j.cnki.jos.006196 CSTR:
Abstract:The existing methods of selecting threshold based on the maximum entropy criterion involve two or more random variables. They all ignore a constraint that the random variables involved in the overall entropy calculation of a random system should be independent of each other, which directly affects their segmentation accuracy and application scope. In this study, an automatic threshold selection method guided by maximizing single Tsallis entropy under bidirectional sparse probability distribution is proposed, which can naturally circumvent the constraint that multiple random variables should be independent of each other. On two images derived from a multi-scale convolution transformation, the proposed method first constructs a two-dimensional random variable with bidirectional sparse probability distribution, then a two-dimensional Tsallis entropy is defined on the basis of the two-dimensional random variable. After simplifying the calculation of two-dimensional Tsallis entropy to only involve the marginal probability distribution of the two-dimensional random variables, the corresponding threshold when the single Tsallis entropy takes maximal value is selected as the final segmentation threshold. The proposed method is compared with an interactive thresholding method, 4 automatic thresholding methods, and an automatic clustering method on 44 synthetic images and 44 real-world images, and the gray level histograms of these test images are unimodal, bimodal, multimodal or peakless. The experimental results show that the proposed method is not superior to these 5 automatic methods in computational efficiency, but it has a significant enhancement in the adaptability and accuracy of segmentation.
BI Xiu-Li , LU Meng , XIAO Bin , LI Wei-Sheng
2022, 33(5):1947-1958. DOI: 10.13328/j.cnki.jos.006198 CSTR:
Abstract:Pancreas segmentation in computed tomography (CT) is one of the most challenging tasks in medical image analysis. Due to small size and changeable shape, the traditional automatic segmentation methods can not achieve the acceptable segmentation accuracy. By using the idea of high-level semantic features to guide low-level features, this study proposes a single-stage pancreas segmentation model based on dual-decoding U-net. The proposed architecture consists of one encoder and two decoders, which can effectively combine low-level spatial information with high-level semantic information using the features of different coding depths to improve the segmentation accuracy of CT slices without clipping and resolution reduction. The experimental results show that this method can achieve better segmentation performance under full-size input. Moreover, the segmentation result by the proposed method is superior to the single-stage methods on the open dataset for pancreas segementation tasks.