LOU Yi-Ling , ZHANG Ling-Ming , HAO Dan , ZHANG Hao-Tian , ZHANG Lu
2022, 33(2):377-396. DOI: 10.13328/j.cnki.jos.006347 CSTR:
Abstract:Fault localization and program repair techniques have been extensively studied for over decades; their only connection is that fault localization serves as a supplier diagnosing potential buggy code for automated program repair. Recently, unified debugging has been proposed to unify fault localization and program repair in the other direction for the first time to boost both areas. ProFL, the first work on unified debugging, opens a new dimension that the large volume of patch execution information during program repair in turn can help boost state-of-the-art fault localization. Note that even state-of-the-art program repair techniques can only fix a small ratio of real bugs (e.g., < 20% for Defects4J) and simply abort for the vast majority of unfixed bugs. In contrast, unified debugging not only directly fixes bugs when possible, but also provides debugging hints for bugs that cannot be automatically fixed (e.g., the patch execution information from such unsuccessful repair can still help boost fault localization for manual repair). Although demonstrated to be a promising direction, unified debugging relies on massive test executions (i.e., million test executions) and can cost hours for execution. This work proposes AUDE to accelerate unified debugging by reducing test executions that provide little helpful feedback for improving fault localization. Specifically, AUDE first constructs an initial execution order of patches guided by Markov chain Monte Carlo sampling strategy, and then adaptively estimates the likelihood of each patch being informative during patch execution on-the-fly. The results on the widely-used Defejcts4J benchmark show that AUDE significantly accelerates ProFL by reducing 70.29% of test executions with negligible effectiveness drop in both fault localization and program repair, e.g., AUDE can localize the same number of bug methods at Top-1/Top-3/Top-5 as ProFL.
SUN Chang-Ai , GENG Ning , DAI He-Peng , GU You-Da
2022, 33(2):397-409. DOI: 10.13328/j.cnki.jos.006137 CSTR:
Abstract:Concurrent programs are composed of multiple concurrent execution flows, which usually share storage space in an explicit or implicit manner. Uncertainty in the execution order of flows poses challenges for concurrent program testing. Mutation testing is a fault-based testing technique that is widely adopted to evaluate the adequacy of test suites and the effectiveness of test techniques. A key issue to applying mutation testing to concurrent programs is how to efficiently derive a large number of mutants that simulate possible concurrency-specific faults. This study proposes a mutation testing framework for concurrent programs and presents an automated concurrent mutant generation system called CMuJava. An empirical study is conducted to evaluate the correctness and adequacy of mutant sets generated by CMuJava and the mutant generation efficiency of CMuJava. The experimental results show that CMuJava can not only generate correct and adequate mutants, but also significantly improve the efficiency of manual mutant generation.
TSAI Wei-Tek , WANG Rong , HE Juan , DENG En-Yan
2022, 33(2):410-433. DOI: 10.13328/j.cnki.jos.006329 CSTR:
Abstract:Recently, decentralized digital asset exchanges (DDAE) have received significant attention. This study proposes five criteria to evaluate DDAE based on the principles of financial market infrastructure (PFMI). The current DDAEs technologies are discussed and evaluated from two aspects of inter-blockchain communication (IBC) and exchange protocol technology, the implementation principles of several typical technical solutions are elaborated, and various technologies are summarized into different models for analysis. Then, the regulatory issues of current DDAEs are discussed, and a distributed supervision modelis proposed for the problems of incomplete supervision data and data tampering. This model consists of three parts: Blockchain system, supervision execution engine, and supervision regulation database, which can automatically executethe supervision rules in the supervision regulation database by reading the transaction data in the blockchain and automatically generate the supervision report for the transactions meeting the supervision rules. Finally, the development of DDAEs is summarized and anticipated.
QIAN Zhong-Sheng , ZHU Jie , ZHU Yi-Min , YU Qing-Yuan , LI Duan-Ming , SONG Jia
2022, 33(2):434-454. DOI: 10.13328/j.cnki.jos.006149 CSTR:
Abstract:Using multi-population genetic algorithm to solve the problem of multi-path coverage is an important research direction in the field of automatic generation of test data. In order to improve the efficiency of multi-path coverage test data automatic generation, a multi-path coverage strategy combining key point probability and path similarity is proposed. Firstly, the theoretical path is divided into easily-covered, difficultly-covered, and unreachable paths. Then, the key point probability is counted through the easily-covered paths, the contribution of the individual to the generated test data is calculated by using this probability, and the contribution isusedto improve the fitness function, at the same time, the target path is sorted according to the key point probability. Finally, the test data covering the target path is generated by using multi-population genetic algorithm. After the sub-population covers the current target path during the evolution process, it continues to try to cover similar paths of the target path. The experimental results show that the proposed method can improve the efficiency of multi-path coverage to generate test data.
SUN De-Quan , ZHOU Jing-Wen , ZHOU Hai-Fang
2022, 33(2):455-472. DOI: 10.13328/j.cnki.jos.006160 CSTR:
Abstract:The temporal invariants reflect the temporal logic relationship between events and have been widely used in anomaly detection, system behavior understanding, model reasoning, and other techniques. Generally, mining temporal invariants through analyzing the log data of software system is in actual use. Compared with totally ordered log, partially ordered log can provide a more accurate data source for mining algorithm. However, the existing temporal invariants mining methods based on partially ordered log have some problems such as its low efficiency. For this reason, this study uses the system execution path as the data source and proposes a temporal invariants mining method SOTIMiner based on set operations and studies an improved scheme. Compared with existing methods, it does not need to traverse the log data in reverse and for the reason that it has a higher efficiency. Experiments show that the method's average efficiency is 3.23 times of the Synoptic mining tool on the basis of guaranteeing the same result.
ZHONG Bing-Nan , DENG Liang , ZENG Qing-Kai
2022, 33(2):473-497. DOI: 10.13328/j.cnki.jos.006211 CSTR:
Abstract:In order to solve the problem caused by untrusted kernel, the trusted base architecture at the same privilege of the kernel has been proposed by a lot of works. It provides the only one protection domain to deploy security mechanism at the same hardware privilege level of the kernel. However, in practice, it is often faced with diversified security requirements. Moreover, it is high risk to make multiple corresponding security mechanisms concentrated into a single protection domain. All other security mechanisms in the same protection domain may be maliciously tampered or destructed, as long as any one of the security mechanisms is compromised by the attacker. To address this problem, a kernel-level multi-domain isolation model isproposed in this study, which constructs multiple protection domains at the same hardware privilege level with the kernel to achieve internal isolation of different security mechanisms, and it will alleviate the security risks of traditional method which bind all security mechanisms into a single protection domain. This study has implemented the decentralized-KPD prototype system of the kernel-level multi-domain isolation model, which uses hardware virtualization technology and address remapping technology to deploy different security mechanisms in multiple protection domains at the kernel privilege level and it will not cause a large performance overhead. Overall, the experimental results demonstrate the security and utility of the kernel-level multi-domain isolation model.
HU Bing-De , WANG Xin-Gen , WANG Xin-Yu , SONG Ming-Li , CHEN Chun
2022, 33(2):498-523. DOI: 10.13328/j.cnki.jos.006353 CSTR:
Abstract:With the rise of graph structured data mining, hypergraph, as a special type of graph structured data, is widely concerned in social network analysis, image processing, biological response analysis, and other fields. By analyzing the topological structure and node attributes of hypergraph, many problemscan be effectively solved such as recommendation, community detection, and so on. According to the characteristics of hypergraph learning algorithm, it can be divided into spectral analysis method, neural network method, and other method. According to the methods used to process hypergraphs, it can be further divided into expansion method and non-expansion method. If the expansion method is applied to the indecomposable hypergraph, it is likely to cause information loss. However, the existing hypergraph reviews do not discuss that hypergraph learning methods are applicable to which type of hypergraphs. So, this article discusses the expansion method and non-expansion method respectively from the aspects of spectral analysis method and neural network method, and further subdivides them according to their algorithm characteristics and application scenarios. Then, the ideas of different algorithms are analyzed and comparedin experiments. The advantages and disadvantages of different algorithms are concluded. Finally, some promising research directionsare proposed.
CHEN Si-Hong , SHEN Hao-Jing , WANG Ran , WANG Xi-Zhao
2022, 33(2):524-538. DOI: 10.13328/j.cnki.jos.006163 CSTR:
Abstract:Adversarial robustness describes the ability of the model to resist adversarial examples and adversarial training is a common method to improve the model's adversarial robustness. However, adversarial training will reduce the accuracy of the model on clean samples. This phenomenon is called accuracy-robustness problem. Due to the need to generate adversarial examples during the adversarial training, this process significantly increases the training time of the network. This work studies the relationship between prediction uncertainty and adversarial robustness, and draws the following conclusions: the greater the prediction uncertainty, the greater the adversarial robustness. The conclusion is explained as: the boundary of the model obtained by cross-entropy is not perfect. In order to minimize the cross-entropy, the classification surface of some classes may become narrow, which makes the samples of these classes vulnerable to adversarial attacks. And if the output's information entropy is maximized while training the model, the classification surface of the model could be more balanced, that is, the distance between boundary and data is as far as possible, which makes it more difficult for the attacker to attack the samples. Based on this finding, a new methodis proposed to improve the adversarial robustness of the model, by increasing the uncertainty of the model's prediction to improve the adversarial robustness of the model. While ensuring the accuracy of the model, the prediction's information entropy is larger. Extensive experiments and simplified model derivations on the MNIST, CIFAR-10, and CIFAR-100 datasets have confirmed the statistical relationship that the adversarial robustness increases with the increase of the model's prediction uncertainty. The method proposed in this study also can be combined with adversarial training to further improve the model's adversarial robustness.
LI Rui-Yu , ZHU Ji-Hua , LIU Xin-Yuan
2022, 33(2):539-554. DOI: 10.13328/j.cnki.jos.006139 CSTR:
Abstract:In last few years, as a new supervised learning paradigm, label distribution learning (LDL) has been applied to many fields and shown good results in these fields, such as face age estimation, head posture estimation, movie score prediction and crowd count in public video surveillance. Recently, the correlations between labels have been considered in some algorithms when solving the problem of label distribution learning. However, most of the existing methods take label correlations as a prior knowledge, which may not be able to correctly describe the real relationship between labels. In addition, label correlations are usually used to regularize the hypothesis space in the training phase, while the final label distribution prediction does not use these correlations explicitly. Therefore, this study proposes a new label distribution learning method, label distribution learning with collaboration among labels (LDLCL), which aims to explicitly consider the correlated predictions of labels while training the expected model. Specifically, the hypothesis is first proposed: for each label, the final prediction involves the cooperation between its own prediction and other labels' predictions. Based on this assumption, a new method is proposed to learn label correlations by sparse reconstruction in label space. Then, the learned label correlations are seamlessly integrated into model training, and finally the learned label correlations are used in label prediction. Sufficient experimental results show that the proposedapproach is superior to other similar methods.
ZHANG Ya-Wen , WANG Zhi-Hai , LIU Hai-Yang , ZENG Zhao-Bo
2022, 33(2):555-570. DOI: 10.13328/j.cnki.jos.006142 CSTR:
Abstract:Time series data widely exists in daily lives, attracting more and more scholars to conduct in-depth research on it. Time series classification is an important research field of time series, and hundreds of classification algorithms have been proposed. These methods are roughly divided into distance-based methods, feature-based methods, and deep learning-based methods. The first two types of methods require manual processing of features and artificial selection of classifiers, and most deep learning-based methods are end-to-end methods and show good classification results in time series classification problems. Nevertheless, the current deep learning-based methods are rarely able to improve the network for the problem of time scale selection in time series data, and rarely integrate the network in terms of network structure to better leverage their respective advantages. In order to solve these two kinds of problems, this study proposes a multi-scale residual full convolutional neural network (MRes-FCN) structure to deal with time series problems. The structure is mainly divided into the data preprocessing stage, the stage of combining the full convolutional network and the residual network. In order to evaluate the performance of this method, this study conducted experiments on 85 public data sets of UCR and compared them with distance-based methods, feature-based methods, and deep learning-based methods. Experiments show that the proposed method has better performance than other methods, and it is better than most methods on multiple data sets.
SU Chang , LI Jun-Chao , WANG Xiao-Mei , CHEN Yi-Jiang
2022, 33(2):571-584. DOI: 10.13328/j.cnki.jos.006144 CSTR:
Abstract:Chinese cognition and conceptual system is built on metaphors. Metaphor comprehension is an important task in natural language processing. Based on the interaction theory, the cooperative network model (CNM) is proposed, which better conforms to the character of metaphor, to perform metaphor comprehension. The CNM reveals the semantic relations between target and source domain, i.e., seeking common ground while reserving differences. Besides, the CNM addresses bidirectional semantic relations between concepts. By computing cooperative strength of relations, the CNM obtains the meaning of metaphorical context. The experimental results demonstrate the effectiveness of the CNM. The outputs of the CNM preliminarily reflect the dynamic and prominent mechanisms of metaphor comprehension.
CAO Rong-Wei , ZHU Ji-Hua , HAO Wen-Yu , ZHANG Chang-Qing , ZHANG Zhuo-Han , LI Zhong-Yu
2022, 33(2):585-597. DOI: 10.13328/j.cnki.jos.006148 CSTR:
Abstract:In order to solve the problem of clustering multi-view data, many multi-view subspace clustering methods have been proposed and achieved great success. However, they cannot well fix the following two problems. 1) How to leverage the difference between different views to learn a shared coefficient matrix with high quality. 2) How to further enforce the low rank property of the common coefficient matrix. To handle the above problems, an effective method dubbed dual weighted multi-view subspace clustering is proposed. In detail, the coefficient matricesare first learned for each view by self-representation model, and then they are fusedinto a common representation with a self-weighted strategy, finally weighted nuclear norm instead of nuclear norm is employed to approximate the rank of the common coefficient matrix, so that the performance of clustering can be improved. An augmented Lagrange multiplier based optimal algorithm is imposed to solve the established objective function. Experiments conducted on six real world datasets validate the superiority of the proposed method.
SHI Chuan , WANG Rui-Jia , WANG Xiao
2022, 33(2):598-621. DOI: 10.13328/j.cnki.jos.006357 CSTR:
Abstract:The real-world systems usually contain different types of components that interact with each other. Most existing work models these interaction systems as homogeneous information networks, which does not considerthe heterogeneous interaction relationships among objects, resulting in lots of information loss. In recent years, more researchers model these interaction data as heterogeneous information networks (HINs) and conduct knowledge discovery based on the comprehensive structural information and rich semantic information in HINs. Specifically, with the advent of the era of big data, HINs naturally merge heterogeneous data sources, which make it an important way to solve the variety of big data. Therefore, heterogeneous information network analysis has quickly become a hot spot in data mining research and industrial applications. This article provides a comprehensive overview of heterogeneous information network analysis and applications. In addition to the basic concepts in heterogeneous information networks, the focus of this article is on the latest research progress in meta-path based data mining, heterogeneous information networks representation learning, and practical applications of heterogeneous information networks. In the end, this article points out the possible directions of future development.
ZHAO Xiao-Qiang , CUI Yan-Peng , GUO Zheng , LIU Min , LI Xiong , WEN Qin
2022, 33(2):622-640. DOI: 10.13328/j.cnki.jos.006159 CSTR:
Abstract:As one of the key technologies of wireless sensor networks (WSNs), clustering routing protocol has gradually become a research hotspot of WSNs routing protocol due to its advantages of strong scalability and low energy consumption. How to select the optimal cluster head is the key to improve the performance of cluster routing protocol. In this study, by revealing the mapping relationship among cluster head number and the network energy consumption in different scenarios, with the goal of minimizing energy consumption, the calculation theory of optimal number of cluster heads is constructed. The conditions of using multi-hop strategy among clusters are discussed for different scale networks; the concept of virtual cluster head and its three virtual force models is proposed. Three virtual force models between virtual cluster head and boundaries, node and other virtual cluster heads are constructed, and the optimal distance thresholds for different virtual forces and the differences are discussed. In order to realize the minimization and equalization of network energy consumption, the fitness function of residual energy and distance factor is set up to form an energy efficient routing protocol based on virtual force. The experimental results show that in networks of various scales, compared with the fitness-value based improved gray wolf optimizer, the improved low-energy adaptive clustering hierarchy protocol and the modified distributed energy efficient clustering algorithm, the algorithm proposed in this studymakes the cluster head more uniform, the node energy consumption lower and more balanced, and the network life is effectively extended.
YANG Wei-Yong , LIU Wei , CUI Heng-Zhi , WEI Xing-Shen , HUANG Hao , LIAO Peng , QIAN Zhu-Zhong , WANG Yuan-Qiang
2022, 33(2):641-663. DOI: 10.13328/j.cnki.jos.006154 CSTR:
Abstract:With the gradual advancement of the State Grid power Internet of Things (IoT), the edge-computing framework as its core support has gradually become a research hot topic. This article firstly summarizes the existing research work on the IoT and edge-computing framework. Secondly, by analyzing the key requirements of the power IoT in business scenarios, edge computing and information security; a way to adapt to the power SG-Edge is proposed, which is a trusted edge-computing framework for the IoT. Then, key technical methods such as hardware trusted boot and dynamic measurement of software behavior are proposed to meet the key challenges of the trusted protection of the edge framework. Finally, the SG-Edge is fully evaluated from the aspects of business adaptability and security, and the possible future research challenges are prospected.
LIU Lin-Feng , XIANG Yang , WU Jia-Gao
2022, 33(2):664-682. DOI: 10.13328/j.cnki.jos.006152 CSTR:
Abstract:With the development of various mobile ad hoc networks, and in order to monitor and explore the underwater environments conveniently, underwater wireless sensor networks (UWSNs) have emerged and attracted the increasing attentions of researchers. UWSNs can be widely utilized in many underwater scenarios such as marine environment monitoring, resource exploitation, underwater biological research, shipwreck search, underwater rescue, and so on. A UWSN is significantly different from the traditional wireless sensor networks, due to the irregularity of underwater acoustic communications and the complexity of underwater environments. Moreover, a UWSN is usually composed of two types of nodes: Anchored nodes and mobile nodes. All these bring some new challenges to the technique of message dissemination in UWSNs, such as the complex movements of nodes and the uncertain future links. Therefore, a reasonable message dissemination algorithm for UWSNs will be helpful to improve the data transmission efficiency. According to the characteristics of UWSN topologies, this study applies a link prediction method for the message dissemination, and an index of spatial-temporal common neighbors is specially introduced to analyze the potential links between nodes. In addition, compared with the mobile nodes, each anchored node typically has a larger communication range and a stronger computing power, and thus each anchored node can play the role of an edge computing server to further improve the link prediction results. Finally, the next-hop relay nodes can be selected according to the obtained link prediction results. Simulation results show that the proposed algorithm can improve the delivery ratio and reduce the propagation delay of data messages while the number of forwarded message copies is confined.
HUANG Ke-Zhen , LIAN Yi-Feng , FENG Deng-Guo , ZHANG Hai-Xia , WU Di , MA Xiang-Liang
2022, 33(2):683-698. DOI: 10.13328/j.cnki.jos.006314 CSTR:
Abstract:With the rapid development of technologies such as computers and smart devices, cyber attack incidents happen frequently, which cause increasingly serious economic losses or reputation losses. In order to reduce losses and prevent future potential attacks, it is necessary to trace the source of cyber attack incidents to achieve accountability for the attackers. The attribution of cyber attackers is mainly a manual process by forensic analyst. Faced with increasing analysis data and analysis dimensions, semi-automated or automated cyber attackers mining analysis methods are urgently needed. This study proposes a graph model-based attacker mining analysis method for cyber attack incidents. This method first establishes an ontology model for cyber attack incident attribution, and then fuses clue data extracted from cyber attack incidents with various threat intelligence data to construct a cyber attack incidents attribution relationship graph. The graph embedding algorithm automatically learns the representation vector of cyber attack incidents, which embedded clue characteristics of cyber attack incidents, from the attribution relationship graph of cyber attack incidents. And then a classifier is trained with the historical cyber attack incidents representation vector, which classifies the cyber attack incident to one cyber attacker. Finally, the feasibility and effectiveness of the method are verified by experiments.
XU Chuan , DING Ying-Yi , LUO Li , LIU Shuai-Jun , LIU Li-Xiang , ZHAO Guo-Feng
2022, 33(2):699-716. DOI: 10.13328/j.cnki.jos.006157 CSTR:
Abstract:With the rapid development of vehicular networks, location privacy leakage is a key security issue when users enjoy location- based services (LBSs) provided by vehicular networks. This study proposes a personalized location privacy protection scheme based on differential privacy to address the issue of privacy leakage of location services in vehicular networks, which can meet the personalized privacy needs of users on the premise of protecting their privacy. Firstly, a normalized decision matrix is defined to describe the efficiency and privacy effects of navigation recommendations. Then, the utility model is established by introducing the multi-attribute theory, and the user's privacy preference is integrated into the model to select the best driving route for the user. Finally, considering the user's privacy preference, the distance proportion is used as the measurement index to allocate the appropriate privacy budget for the user, and the false location generation range is determined to generate the most effective service request location. Based on the real data set, the proposed scheme is compared with the existing scheme through simulation experiments. The experimental results show that the personalized location privacy protection scheme proposed in this study can meet the service requirements of users and provide higher quality of service (QoS) while reasonably protecting the privacy of them.
YOU Qi-Di , ZHANG Xi-Yong , ZHOU Xuan , WU Zhao-Yang , YUAN Ye
2022, 33(2):717-724. DOI: 10.13328/j.cnki.jos.006158 CSTR:
Abstract:Cryptographic functions have important applications in the research of cryptography. This paper describes a more suitable approach to prove the nonexistence of some cryptographic functions, and obtain some new results, which support the conjecture that there are no homogeneous rotation symmetric bent functions of algebraic degree > 2. Also, homogeneous degree 2 rotation symmetric bent functions are characterizedby using GCD of polynomials. The method presented in this paper can also be used to characterize the existence of other forms of bent functions.
2022, 33(2):725-737. DOI: 10.13328/j.cnki.jos.006143 CSTR:
Abstract:Joint photographic experts group (JPEG) is the most widely used image compression format in daily life. Using reversible data hiding (RDH) technology to authenticate its facticity and integrity is of great significance. This study proposes a new RDH method based on JPEG image. In the proposed algorithm, the small non-zero alternating current (AC) coefficientsin each 8×8 sub-block of the JPEG image are paired according to their amplitudes, and a new two-dimensional reversible mapping rule is designed for these paired AC coefficients to embed information, where the rest large non-zero AC coefficients are used for shifting to make room. Moreover, in order to further reduce the ineffective shifts of AC coefficients, a new adaptive frequency selection strategy is also proposed. In each sub-block, the coefficients belonging to different frequency bands are adaptively selected in the embedding process. The experimental results show that the proposed method is superior to the most advanced methods in terms of the visual quality and increase of file storage size.
JING Yu-Han , HE Bo , ZHANG Ling-Xin , LI Tian-Xing , WANG Jing-Yu , LIU Cong
2022, 33(2):738-750. DOI: 10.13328/j.cnki.jos.006212 CSTR:
Abstract:Additivity of multidimensional KPIs (key performance indicators) was used to achieve root cause location for large-scale Internet services. The anomaly caused by one or more root causes usually results in the change of a large number of relevant KPIs. A pruning search model based on anomaly similarity and effectiveness factor for root cause location (PASER) was proposed, which indicated the probability of candidate set becoming root cause using potential score based on the anomaly propagation model of multi- dimensional KPI. The pruning search algorithm used in PASER also managed to reduce the location time to about 5.3 seconds on average. In addition, the selection of time series prediction algorithm was also discussed. PASER had finally achieved a performance of 0.99 F-score on the experimental dataset.
QIU Jie-Fan , HUA Zong-Han , FAN Jing , LIU Lei
2022, 33(2):751-769. DOI: 10.13328/j.cnki.jos.006370 CSTR:
Abstract:This paper shows the technical evolution of memory partitioning on shared memory systems using the cases of page coloring. Multiple co-running applications incur memory contentions across the entire memory hierarchy in the age of multicore, negatively impacting the overall system performance and the QoS. Researchers are always seeking new solutions to aid these problems. The paper presents the well-known and well-developed technology—"Page coloring" and summarizes its evolution since 1990. This paper introduces how to leverage page coloring on Cache, DRAM/NVM Banks in main memory systems, and memory channels. On the Bank level, concurrent applications are mapped to a different group of Banks, thus eliminating inter-application memory conflicts. In terms of Cache and DRAM Banks, this paper introduces the "vertical partitioning" to mitigate memory conflicts across the multiple levels at the memory hierarchy at the same time. Finally, this study shows how to use page coloring on systems equipped with hybrid NVM systems. The experimental results show that the memory partitioning technologies through page coloring bring significant performance benefits and QoS.