JIANG Yan-Yan , XU Chang , MA Xiao-Xing , LÜ Jian
2017, 28(4):747-763. DOI: 10.13328/j.cnki.jos.005193 CSTR:
Abstract:Concurrent bugs are difficult to trigger, debug, and detect. Dynamic program analysis techniques have been proven useful in addressing such challenges. Due to non-deterministic nature of concurrent programs in which the major source of non-determinism is the shared memory, obtaining the order of shared memory accesses, i.e., shared memory dependences, is the basis of such dynamic analyses. This work proposes a survey framework to demonstrate the key issues in obtaining the shared memory dependences. The framework includes four performance metrics (immediacy, accuracy, efficiency and simplicity), two categories of approaches (online tracing and offline synthesis), and two categories of applications (trace analysis and concurrency control). Existing techniques as well as potential future work are studied in the paper.
HE Wang-Quan , LIU Yong , FANG Yan-Fei , WEI Di , QI Feng-Bin
2017, 28(4):764-785. DOI: 10.13328/j.cnki.jos.005197 CSTR:
Abstract:Heterogeneous many-core architecture, with ultra-high performance to power consumption ratio, has become an important trend of supercomputer architecture development. However, many-core systems always have more complex parallel hierarchy and memory hierarchy, hence posing a great challenge to programming and optimization. Therefore, the study of many-core-oriented parallel programming techniques is of great significance, since it can reduce the difficulty of parallel programming on domestic many-core systems and improve the performance of parallel programs. This work proposes a multi-model parallel programming model upon unified architecture, including heterogeneous-fused speedup programming model and isomorphic independent programming model. Based on this model, Parallel C programming language is designed to effectively describe heterogeneous parallelism of the domestic many-core system. Compared to MPI+X programming pattern, programming with Parallel C has a global perspective, as well as advantages in the hierarchy locality description, one-side message passing and multi-core applications compatibility. The Parallel C compiler system constructed with Open64 fully supports the heterogeneous-fused speedup programming model and isomorphic independent programming model. In addition, the design and implementation of data layout and automatic DMA optimization, compiler-directed thread proxy optimization and topology-aware collective communications optimization are presented. The performance of the proposed method is evaluated with the Miro Benchmark and practical applications on Sunway Taihu Light computer system. Experimental results show that Parallel C language and the compile system have good performance and scalability to effectively support large-scale applications.
LI Bin , TANG Zhen-Hao , ZHAI Juan , ZHAO Jian-Hua
2017, 28(4):786-803. DOI: 10.13328/j.cnki.jos.005195 CSTR:
Abstract:This paper presents an approach to prove that a concrete program correctly implements its corresponding abstract program. Here, an abstract program uses some abstract data types such as set, list and map, and abstract operations upon those data types. A concrete program uses the types in the C-like language. The approach presented in the paper requires to specify correspondences between the abstract program and the concrete program, including correspondences between program points and correspondences between variables. Based on the correspondences, the verification task can be divided into small subtasks that can be easily and mostly automatically verified.
2017, 28(4):804-818. DOI: 10.13328/j.cnki.jos.005191 CSTR:
Abstract:Since multi-stack models are generally Turing-complete, verification problems on general models for asynchronous communication programs are undecidable. This paper proposes a new model based Petri net, named communication Petri nets (C-Petri nets) to model asynchronous communication programs. Applying the k-shaped restriction on the input communications and the abstraction on each stack based on popping lemma of regular languages, the coverability problem of the restricted C-Petri net is decidable by encoding the model to data Petri nets.
2017, 28(4):819-826. DOI: 10.13328/j.cnki.jos.005196 CSTR:
Abstract:Formal verification is an effective approach to construct high confidence software. Verifying the functional correctness of complex system software by manually writing proof scripts in proof assistant tools is feasible with the low degree of automation, and the verification cost is relatively high. The automatic program verifiers verify programs by taking the annotated source code as their input to generate verification conditions automatically solved by SMT solvers. This approach has a high degree of automation, but it is impossible to verify the functional correctness of the entire system software. By combining the advantage of the above two methods, this paper implements a novel Coq tactic plug-in named "smt4coq", which allows calling the Z3 SMT solver in Coq to automatically prove mathematical propositions involved with 32-bit machine integers. The new tactic improves the degree of automation and reduces the cost of manual verification.
LI Xiao , ZHOU Yan , LI Meng-Chen , CHEN Yuan-Jun , XU Guo-Qing , WANG Lin-Zhang , LI Xuan-Dong
2017, 28(4):827-844. DOI: 10.13328/j.cnki.jos.005189 CSTR:
Abstract:Memory leak, which has perplexed software developers for a long time because of imperceptibility, is a very common bug for C/C++ programs and can do serious harm especially for long-running program or system software. Aiming at this problem, both static and dynamic program analysis techniques have been attempted. Dynamic program analysis technique detects memory leak by running the program, which has huge overhead and depends on the quality of test cases. Static analysis technology and automatic tools are widely used in the work of detecting memory leaks among academic community and industrial community. Since it uses conservative algorithm, Static analysis is able to detect a lot of defects but at the sometime increases the false positives, which needs manual confirmation. As manual confirmation is time-consuming and error prone, it limits the practicability of the technology. In this paper, a novel method to automatically validate static memory leak warnings is proposed based on concolic testing. First, drawing on the memory leak warnings given by static analysis report, the control flow of the target program is analyzed and the reachability of the target path is calculated. Then the path information is used to guide the concolic testing and execute program in the particular path. Finally, the static warnings is validated by tracking memory object during execution. Experimental results show that this method can effectively classify static warnings and significantly reduce the workload of manual validation.
LI Deng-Hui , ZHAO Jia-Cheng , CUI Hui-Min , FENG Xiao-Bing
2017, 28(4):845-859. DOI: 10.13328/j.cnki.jos.005194 CSTR:
Abstract:Datacenters are built to house massive internet services at an affordable price. Both Op-ex (long-time operational expenditure) and Cap-ex (one-time construction costs) are directly impacted by datacenter power consumption. Thus, DVFS (dynamic voltage and frequency scaling) is widely adopted to improve per node energy efficiency. However, it is well known but has not yet been fully explored that such schemes affect an application's power consumption and performance simultaneously. This paper focuses on the impact of DVFS on performance of an application and proposes an analytical model to quantitatively characterize the relationship between an application's performance and a processor's frequency, which can be leveraged to predict the performance of an application at any frequency. Specifically, according to different memory subsystem resources accessed, instructions of an application are divided into two parts:on-chip instructions and off-chip instructions, which can be modeled independently. On-Chip instructions refer to instructions which only access on-chip resources, and their execution time is frequency-relevant and can be modeled using a linear function. Off-chip instructions stand for instructions accessing the main memory, and their execution time is dominated by memory access latency and is frequency- irrelevant. By the division and modeling of the two parts, a quantitative model can be obtained between the execution time of an application and frequency of a processor. Evaluations using two different platforms and all benchmarks of SPEC 2006 show that the derived models are very precise, with average prediction error less than 1.34%.
WANG Lei , LI Feng , LI Lian , FENG Xiao-Bing
2017, 28(4):860-882. DOI: 10.13328/j.cnki.jos.005190 CSTR:
Abstract:Information flow analysis is a promising approach for protecting the confidentiality and integrity of information manipulated by computing systems. Taint analysis, as in practice, is widely used in the area of software security assurance. This survey summarizes the latest advances on taint analysis, especially the solutions applied in different platform applications. Firstly, the basic principle of taint analysis is introduced along with the general technology of taint propagation implemented by dynamic and static analyses. Then, the proposals applied in different platform frameworks, including techniques for protecting privacy leakage on Android and finding security vulnerabilities on Web, are analyzed. Lastly, further research directions and future work are discussed.
TONG Qing , ZHANG Zheng , ZHANG Wei-Hua , WU Jiang-Xing
2017, 28(4):883-897. DOI: 10.13328/j.cnki.jos.005192 CSTR:
Abstract:The Web server system, being the most important platform of supporting and providing network services, is facing serious security problem. The existing defending technologies mainly deal with the known attacking methods or the known vulnerabilities, and therefore are not effective in case of the unknown threats and do not provide overall defense. This paper first proposes an attacking model to analyze the shortcomings of existing defending technologies. Next, a dynamic heterogeneous redundancy structure based mimic defending model is proposed, and its defending principles and the characteristics are interpreted. Then, the mimic defending Web server is designed on the mimic defending model, and the structure and the implementation principles in the Web server design are introduced. The results of security and performance tests show that the presented mimic defending Web server can defend against all kinds of attacks in the tests with little performance loss, which verifies the effectiveness and the practicability of the mimic defending technology. Finally a perspective of the future work and challenges of mimic defending technology is discussed.
LIU Yao , DUAN Zhen-Hua , TIAN Cong
2017, 28(4):898-906. DOI: 10.13328/j.cnki.jos.005057 CSTR:
Abstract:Linear time μ-calculus (νTL) is a formalism which has a strong expressive power with a succinct syntax. It is useful for specifying and verifying various properties of concurrent programs. However, the nesting of fix point operators makes its decision problem difficult to solve. To tackle the issue, a tool called NuTL2PFG for checking the satisfiability of νTL formulas is developed in this paper. Based on present future form (PF form) of νTL formulas, the tool is able to construct the present future form graph (PFG) for a given formula to specify the models that satisfy the formula. Further, the tool checks the satisfiability of a given formula by searching for a ν-path in its PFG free of infinite unfoldings of least fixpoints. Experimental results show that NuTL2PFG is more efficient than the existing tools.
HE Xiao , LI Wen-Feng , ZHANG Tian , MA Zhi-Yi , SHAO Wei-Zhong , HU Chang-Jun
2017, 28(4):907-924. DOI: 10.13328/j.cnki.jos.005055 CSTR:
Abstract:Model transformation is the key to model-based software engineering. When the model transformation is applied to industrial developments, its scalability becomes an important issue. To test the performance of model transformations, developers must be able to generate a set of models, i.e. the test inputs, efficiently. This paper proposes a randomized approach to generating large models. This approach can produce a model randomly and correctly based on the definition of metamodel and user-defined constraints. And the evaluation result also shows that the proposed approach is more efficient than other approaches, and therefore is more suitable for supporting performance testing of transformations.
GAO Wei , HAN Lin , ZHAO Rong-Cai , XU Jin-Long , CHEN Chao-Ran
2017, 28(4):925-939. DOI: 10.13328/j.cnki.jos.005029 CSTR:
Abstract:SIMD extension is an acceleration component integrated into the general processor, aiming at exploiting data level parallelism in multimedia and scientific computation programs. Two of the mainstream vectorization methods are loop-based method oriented to inter-iteration and SLP method oriented to intra-iteration. Derived from SLP, loop-aware method transforms inter-iteration to intra-iteration through loop unrolling, so as to obtain enough isomorphic statements and then uses SLP to explore vectorization. However, when loop unrolling is illegal or SIMD parallelism is lower than the vector factor, loop-aware method cannot exploit SIMD parallelism of programs. To address this drawback, a vectorization method guided by SIMD parallelism for loops is proposed. Alternative scheme for loop vectorization is constructed in view of inter-iteration parallelism, intra-iteration parallelism and vector factor. Simultaneously, insufficient vectorization is proposed to vectorize loops whose parallelism is lower than the vector factor. Lastly, vectorized loop is unrolled according to SIMD parallelism. Test results by benchmarks show that vectorization method guided by SIMD parallelism outperforms loop-aware method by 107.5%. Moreover, the performance is improved by 12.1% compared with loop-aware method.
ZHANG Jun-Na , WANG Shang-Guang , SUN Qi-Bo , YANG Fang-Chun
2017, 28(4):940-958. DOI: 10.13328/j.cnki.jos.005051 CSTR:
Abstract:Traditional fault-tolerance approaches often result in low efficiency of service composition in integration networks. In this paper, a fast and reliable fault-tolerance approach is proposed for service composition in integration networks. This approach firstly adopts fuzzy logic to perform service retry when the transient faults of service occur. And then multi-attribute decision-making theory is employed to carry out service replicate when the permanent faults of service occur. Finally, an improved particle swarm optimization algorithm is used to implement service compensation when the permanent faults of service arise. The experimental results based on real data sets show that the proposed approach is superior to other approaches in terms of fault handling rate, fault handling time, and composition optimization.
GAO Qiang , ZHANG Feng-Li , WANG Rui-Jin , ZHOU Fan
2017, 28(4):959-992. DOI: 10.13328/j.cnki.jos.005143 CSTR:
Abstract:The development of mobile internet and the popularity of mobile terminals produce massive trajectory data of moving objects under the era of big data. Trajectory data has spatio-temporal characteristics and rich information. Trajectory data processing techniques can be used to mine the patterns of human activities and behaviors, the moving patterns of vehicles in the city and the changes of atmospheric environment. However, trajectory data also can be exploited to disclose moving objects' privacy information (e.g., behaviors, hobbies and social relationships). Accordingly, attackers can easily access moving objects' privacy information by digging into their trajectory data such as activities and check-in locations. In another front of research, quantum computation presents an important theoretical direction to mine big data due to its scalable and powerful storage and computing capacity. Applying quantum computing approaches to handle trajectory big data could make some complex problem solvable and achieve higher efficiency. This paper reviews the key technologies of processing trajectory data. First the concept and characteristics of trajectory data is introduced, and the pre-processing methods, including noise filtering and data compression, are summarized. Then, the trajectory indexing and querying techniques, and the current achievements of mining trajectory data, such as pattern mining and trajectory classification, are reviewed. Next, an overview of the basic theories and characteristics of privacy preserving with respect to trajectory data is provided. The supporting techniques of trajectory big data mining, such as processing framework and data visualization, are presented in detail. Some possible ways of applying quantum computation into trajectory data processing, as well as the implementation of some core trajectory mining algorithms by quantum computation are also described. Finally, the challenges of trajectory data processing and promising future research directions are discussed.
HUANG Jian-Bin , SUN Xiao-Jing , ZHOU Yu , LÜ Ze , SUN He-Li , JIA Xiao-Lin
2017, 28(4):993-1009. DOI: 10.13328/j.cnki.jos.005152 CSTR:
Abstract:Team formation problem has received extensive study in operational research (OR). Recently, along with the popularity of various social sites and the rapid development of Internet communication, team formation in social networks has attracted great study enthusiasm of scholars once again. Team formation problem in social networks differs much from the traditional team formation which takes no account of the social relations and effective communication in a team. Thus it cannot be solved simply by means of set-cover, task assignment or maximum matching any more. Based on sufficient research and analysis, this work surveys the state of the art of the social-based team formation problem. The variants of this problem and their optimization methods are reviews as well. In the meantime, the data-sets and evaluation criteria of experiments in this study are introduced. Finally, the prospect of future work is presented.
GONG Jian , ZANG Xiao-Dong , SU Qi , HU Xiao-Yan , XU Jie
2017, 28(4):1010-1026. DOI: 10.13328/j.cnki.jos.005142 CSTR:
Abstract:As the priority of cyber-security arises world-wide, network security situation awareness (NSSA) and its application help to draw more attentions of researchers. NSSA is able to identify network activities, understand their intentions and evaluate the impact of these activities on the managed network, as well as to support an optimal security response to the security threats. It is a means of quantitative analysis for network security, with which network security management system can have a global view of security states of the managed network, find the intention of attackers, and make a management decision based on these findings. In the paper, the coverage of NSSA is discussed to redefine the concept of NSSA. Then a survey is given on the state-of-art of NSSA's research in the aspects of network security situation perception, comprehension and projection. Finally the features and challenges of network security situation awareness are summarized.
HU Sen-Sen , JI Wei-Xing , WANG Yi-Zhuo , CHEN Xu , FU Wen-Fei , SHI Feng
2017, 28(4):1027-1047. DOI: 10.13328/j.cnki.jos.005245 CSTR:
Abstract:Modern-Day transistor technique enables the industry to integrate many cores on a single chip. As an increasing number of cores being integrated on a single chip, cache coherence has become an intractable issue as well as a bottleneck of performance. In this paper, the origin of cache coherence is carefully described. Further, the paper summarizes the key issue of cache coherence and reviews the study in this field a decade after entering the mulit-core era. From aspects of memory access, directory organization, coherence granularity, coherence traffic and scalability, the work on optimization of cache coherence in recent researches is also presented. Finally, the potential challenges in current coherence protocol and direction of future research are discussed.