PENG Chang-Gen , DING Hong-Fa , ZHU Yi-Jie , TIAN You-Liang , FU Zu-Feng
2016, 27(8):1891-1903. DOI: 10.13328/j.cnki.jos.005096 CSTR:
Abstract:The quantification of privacy plays an important role in the privacy protection. Information entropy as a quantitative method of information can be used to solve the problem of privacy measurement. In order to realize the privacy metrics, several models of privacy information entropy are proposed based on Shannon's Information Theory. These models include the basic information entropy model of privacy protection, the information entropy model of privacy protection with adversary, the information entropy model of privacy protection with subjective feelings and multi-source information entropy model of privacy protection. In these models, the information owner is assumed to be the sender, privacy attacker is assumed as to be the recipient, and the privacy disclosure course can be regarded as a communication channel. Based on these assumptions, the entropy, mutual information, conditional entropy, and conditional mutual information are introduced to represent measurement of privacy, privacy disclosure, and privacy and disclosure with background knowledge for the privacy protection system. Furthermore, the quantitative evaluation of privacy protection strength and adversary ability is provided to support quantitative risk assessment for privacy disclosure. Finally, the specific information entropy model, measurement and analysis of privacy protection algorithms, and adversary ability are supplied for location privacy protection application. The proposed models and privacy metrics can be used as fundamental theory for the privacy protection technology and privacy disclosure risk assessment.
LIU Xiang-Yu , LI Jia-Jia , AN Yun-Zhe , ZHOU Da-Hai , XIA Xiu-Feng
2016, 27(8):1904-1921. DOI: 10.13328/j.cnki.jos.005092 CSTR:
Abstract:As a proven effective solution to privacy preservation, graph anonymization has been studied extensively. The goal of graph anonymization is to avoid disclosure of privacy in social networks through graph modifications while at the same time preserving data utility of the anonymized graph for social network analysis and graph queries. Reachability is an important graph data utility as reachable queries are not only common on graph databases but also serving as fundamental operations for many other graph queries. However, the reachability of each vertex in the anonymized graph is severely distorted after the anonymization due to neglecting that the reachability is highly sensitive to edge modifications. This work solves the problem by designing a reachability preserving anonymization (RPA) algorithm. The main idea of RPA is to organize vertices into groups and greedily anonymizes each vertex with low impact on reachability. A number of techniques are designed to make RPA efficient. Firstly, reachable interval is proposed to efficiently measure the anonymization cost incurred by an edge addition. Secondly, an index structure, CN-index is adopted to accelerate anonymizing each vertex. Extensive experiments on real datasets demonstrate that RPA performs with high efficiency and the generated anonymized social networks preserve high data utility on reachable queries.
WANG Lu , MENG Xiao-Feng , GUO Sheng-Na
2016, 27(8):1922-1933. DOI: 10.13328/j.cnki.jos.005093 CSTR:
Abstract:In the emerging big data era, in addition to explicit publication of users' locations on geo-social networks, positioning system embedded in mobile phones implicitly records users' locations. Although such implicitly collected spatiotemporal data play an important role in a wide range of applications such as disease outbreak control and route recommendation for life science or smart city, they cause new serious privacy issues when cross-referencing with the explicitly published data from users. Existing location privacy preservation techniques fail to preserve the proposed implicit privacy because they ignore the cross-reference between implicitly and explicitly spatiotemporal data. To tackle this issue, this work for the first time investigates and defines the implicit privacy and proposes the discover and eliminate framework. In particular, this paper proposes prefix filtering based nest loop algorithm and frequent moving object based algorithm to generate dummy data to preserve the proposed implicit privacy. Further, it constructs an improved reverse a priori algorithm and graph based dummy data generation algorithm respectively to make the solution more practical. The results of extensive experiments on real world datasets demonstrate the effectiveness and efficiency of the proposed methods.
LIAO Xiang-Ke , LI Shan-Shan , Dong Wei , JIA Zhou-Yang , LIU Xiao-Dong , ZHOU Shu-Lin
2016, 27(8):1934-1947. DOI: 10.13328/j.cnki.jos.004936 CSTR:
Abstract:Standardized and sufficient log is a necessary part of good code quality, and it plays an important role in failure diagnosis as well. Code quality management, however, is restricted by the high complexity of large-scale software. Currently, it's difficult and inefficient to reproduce and diagnose system failure with logs. This paper surveys log-related work from three aspects including log characterization, failure diagnosis with log and log enhancement. Through detailed study on several widely-used open-source software, the paper reveals some log-related observations, along with the problems which have not been well handled by existing tools. Finally, it proposes several possible log-related work, and analyzes potential challenges.
PENG Huan-Feng , HUANG Zhi-Qiu , FAN Da-Juan , ZHANG Yong-Long
2016, 27(8):1948-1963. DOI: 10.13328/j.cnki.jos.004945 CSTR:
Abstract:Users have different privacy information disclosure requirements when they submit private data to service composition, and the composition should support the verification of users' privacy requirements. This paper puts forward a flexible method for users to produce privacy requirement specifications. Users can define the sensitivity of private data and its usage in different situations, and restrict the member services that can use private data with sensitivity-reputation function. The simplification and universality of the privacy requirements can be improved by using this method. The process first establishes privacy data item relations by using the privacy data item dependency graph (PDIDG), then models the service composition with privacy open workflow net (POWFN), and at last, makes sure whether service composition meets the user's privacy requirements by privacy requirements verification algorithm. An example is provided to illustrate the effectiveness of the method, and experiment analysis on the performance of the verification algorithm is carried out at the end of paper.
LU Yang , YUE Feng , ZHANG Guo-Fu , SU Zhao-Pin , WANG Yong-Qi
2016, 27(8):1964-1977. DOI: 10.13328/j.cnki.jos.004845 CSTR:
Abstract:Software testing is the most time and resource consuming stage during software development. For series-parallel software systems, as the reliability of the system changes as the testing time advancing, if the strategy of testing resource allocating is still executed in accordance with the original plan, it may lead to a vast waste of testing resource. To address the issue, this paper tackles a testing resource dynamic allocation problem for series-parallel software systems with bounded resource in the field of search based software engineering. Firstly, the definitions of testing resource, system reliability and testing cost are given. Based on these definitions, a multi- objective dynamic allocation model for testing resource is established with the objective of allocating the testing resource among different modules to maximize the reliability and minimize the testing cost subject to the available testing resource. Then, a “1-dimensional integer vector coding” differential evolution algorithm with improved colony initialization strategy is proposed for the dynamic model. Comparison results with existing models show that the proposed approach is effective and efficient for solving the testing resource allocation problem, therefore providing a way to reduce the consumption of the testing resource and to improve the reliability and development efficiency of series-parallel software systems.
ZHUANG Yuan , ZHANG Peng-Cheng , LI Wen-Rui , FENG Jun , ZHU Yue-Long
2016, 27(8):1978-1992. DOI: 10.13328/j.cnki.jos.004850 CSTR:
Abstract:The execution capacity of service-oriented system relies on the third-party services. However, such reliance would result in uncertainties in consideration of the complex and changeable network environment. Hence, runtime monitoring technique is required for service-oriented system. Effective monitoring technique towards Web QoS, which is an important measure of third-party service quality, is necessary to ensure quality control on Web service. Several monitoring approaches have been proposed, however none of them consider the influences of environment including the position of server and user usage, and the load at runtime. Ignoring these influences, which exist among the real-time monitoring process, may cause monitoring approaches to produce wrong results. To solve this problem, this paper proposes a new environment sensitive Web QoS monitoring approach, called wBSRM (weighted Bayes runtime monitoring), based on weighted naive Bayes and TF-IDF (Term Frequency-Inverse Document Frequency). The proposed approach is inspired by machine learning classification algorithm, and measures influence of environment factor by TF-IDF algorithm. It constructs weighted naïve Bayes classifier by learning part of samples to classify monitoring results. The results that meet QoS standard are classified as c0, and those that do not meet is classified as c1. Classifier can output ratio between posterior probability of c0 and c1, and the analysis can lead to three monitoring results including c0, c1 or inconclusive. Experiments are conducted based on both public network data set and randomly generated data set. The results demonstrate that this approach is better than previous approaches by accurately calculating environment factor weight with TF-IDF algorithm and weighted naïve Bayes classifier.
ZONG Fang-Fang , HUANG Hong-Yun , DING Zuo-Hua
2016, 27(8):1993-2007. DOI: 10.13328/j.cnki.jos.004858 CSTR:
Abstract:Fault localization is a physical and time-consuming activity in the debugging process, especially for the software with large size and high complexity. Existing techniques to locate faults can be classified into two categories: component based and statement based. The former is too coarse to locate the accurate place, while the latter is too fine to contain the computation complexity. This paper proposes a new technique, called double-times-locating (DTL) strategy, to locate software faults. For the first time locating, it abstracts function call graph from the code, builds program spectrum to abstract function traces, and then uses model-based diagnosis (MBD) to sort with probability possible functions candidates that have faults. For the second time locating, it uses DStar to locate faults in the functions. Experimental results show that the proposed technique is more effective than the existing statistics based methods.
GONG Dun-Wei , CHEN Yong-Wei , TIAN Tian
2016, 27(8):2008-2024. DOI: 10.13328/j.cnki.jos.004844 CSTR:
Abstract:A parallel program can yield nondeterministic execution, which increases the complexity and the difficulty in program testing. The mutation testing of a message passing parallel program is investigated, and an approach to transforming the weak mutation testing for the program is presented in this study with the purpose of improving the efficiency of the mutation testing. First, the mutation condition statements are built based on the type of statements and the changes resulted from mutating these statements. Then, a new program is formed by inserting all these mutation condition statements into the original program. As a result, the problem of the weak mutation testing of the original program can be transformed into that of covering the branches of the new program, therefore providing advantages of solving the problem of mutation testing by using previous methods of branch coverage. The proposed approach is applied to test eight benchmark message passing parallel programs, and the empirical results demonstrate that this new approach is not only feasible but also necessary.
WU Yao , ZENG Ju-Ru , PENG Hui , CHEN Hong , LI Cui-Ping
2016, 27(8):2025-2047. DOI: 10.13328/j.cnki.jos.005049 CSTR:
Abstract:In recent years, as a new method of environment sensing, data collecting and information providing, crowd sensing has gradually become one of the research highlights. Incentive mechanism is one of the most important research problems in crowd sensing. The method refers to certain mechanism design that encourages participants to join in sensing tasks and provide high-quality and reliable sensing data. This paper reviews the researches on incentive design of crowd sensing in recent years. First of all, crowd sensing and crowd sensing incentive design are introduced. Then, starting with key techniques, the main incentive methods and the core problems in incentive design are described. Finally, existing work, research challenges, and future directions are discussed. This work is to provide valuable reference for the related researchers.
WANG He-Peng , WANG Hong-Zhi , LI Jia-Ning , KONG Xin-Xin , LI Jian-Zhong , GAO Hong
2016, 27(8):2048-2067. DOI: 10.13328/j.cnki.jos.005060 CSTR:
Abstract:In recent years, with increased data volume, data-intensive computing tasks become increasingly critical. How to efficiently and effectively implement data-intensive computing on large data sets becomes a major research direction for data-intensive computing. Currently, researchers attempt to use new processors to accelerate the data-intensive computing process. Different acceleration approaches could be adopted according to the characteristics of new processors. In this paper, the new processors, well as the algorithms, for data-intensive computing research are surveyed. First, the new features of processors are reviewed. Then the capability of each new processors and their performance over data intensive computing are analyzed. Finally, the future research directions are discussed.
LI Wei-Bang , LI Zhan-Huai , CHEN Qun , YANG Jing-Ying , JIANG Tao
2016, 27(8):2068-2085. DOI: 10.13328/j.cnki.jos.005052 CSTR:
Abstract:Data inconsistency may exist in relational database. One major problem of data quality in relational database is functional dependency violation. To find out inconsistent data in a relational database, people need to detect the functional dependency violations. It is easy to detect dependency violations in centralized databases via SQL-based techniques, although the detection efficiency is not high. However, it is far more challenging to check inconsistencies in distributed databases, not only data shipment needs to be considered, but also the distribution of detecting tasks is a conundrum. These problems are more prominent with big data. This paper proposes a novel single functional dependency inconsistency detection approach in distributed big data, and provides a cost model of inconsistency detection. To reduce data shipment and response time, distributed data are pretreated based on equivalence class. Considering that the inconsistency detection problem is NP-hard, that is impossible to find an optimal solution in polynomial time, this work provides a 3/2-approximate optimal solution. A multiple functional dependencies detection approach is developed for distributed big data based on the minimal set cover theory. This approach allows detecting multiple functional dependencies violations in parallel after one-time data traversal,and it also incorporates load balancing in the detecting process. Experiments on real-world and generated datasets demonstrate that compared with previous detection methods and Naïve method based on Hadoop framework, the presented approach is more efficient and is scalable on big data.
ZHANG Bin , DIAO Xing-Chun , SUN Yan-Tao , DING Kun , YAN Hao
2016, 27(8):2086-2098. DOI: 10.13328/j.cnki.jos.004835 CSTR:
Abstract:Physical network topology discovery is a key issue for network management and planning, performance forecasting, network simulation and security; and how to discover a physical network topology based on address forwarding table (AFT) is a hot topic in current studies. This paper defines minimal constrains on AFT Tables for a switched area of a single subnet or multiple subnets to deducing its physical topology, and proves the completeness of the basic reasoning rule (BRR) proposed in the previous work. Furthermore, the paper analyzes the NP hard problem of AFT based methods, thoroughly discusses all kinds of possible situations deduced by BRR for any local network, and further investigates the limits solely based on AFT topology discovery. This work provides very important theoretical guidance in physical topology discovery based on AFT, and at the same time lays a solid theoretical foundation for new topology discovery methods.
LIU Xiao-Wu , WANG Hui-Qiang , Lü Hong-Wu , YU Ji-Guo , ZHANG Shu-Wen
2016, 27(8):2099-2114. DOI: 10.13328/j.cnki.jos.004852 CSTR:
Abstract:For the purpose of exploring the evolution trend and analyzing the autonomous awareness and control problems, this paper proposes a cognitive awareness-control model for network security situation based on fusion. This model is characterized by the design of the cross-layer architecture and cognitive circle which can improve the interactive and cognitive ability between the different network layers. Based on the analysis of the model components and their functions, this paper uses the fusion algorithm to obtain the accurate decision on the security events made by heterogeneous multi-sensor. Combining with the reasoning of the relation between threat gene and threat level, a hierarchical quantification method is put forward, encompassing service layer, host layer and network layer. This approach has the advantage of overcoming the shortcoming of dealing with the complex memberships among network components and improving the expression ability against network threat. In addition, through establishing the bridge between dispersed computing and the continuous control, the close-up feedback structure is formed and the self-awareness and self-control problems are solved. The simulation experiments prove that the presented model and algorithms can fuse heterogeneous security data, dynamically perceive the evolution trend of network threat and possess the autonomous regulation and control ability. This study meets the research goal of cognitive awareness-control and it provides a new method of monitoring and administrating the networks.
LIN Yi , LIU Yue , WANG Yong-Tian , HE Chang-Yu
2016, 27(8):2115-2134. DOI: 10.13328/j.cnki.jos.004846 CSTR:
Abstract:This dissertation presents a method of pushing context-aware service based on scene classification to overcome the improper use of labels in augmented reality browsers caused by the interference with user's cognitive operations. User's scene is separated by using the proposed method on the basis of user's cognitive processes when searching and retrieving points of interest. The function module of three scenes is built by using the framework of four-layer context-aware services respectively, then the overall construction of the augmented reality system is completed by associating three scenario modules. Experimental results of comparison with the existing popular browsers show that the classification accuracy of the system constructed by using the proposed method is improved by 13% on average, and the user's average satisfaction with the prediction results of the system is increased by about 26%.
JIANG Xin-Lan , WANG Liang , LUO Xiao-Yue , WANG Sheng-Chun , LUO Si-Wei
2016, 27(8):2135-2146. DOI: 10.13328/j.cnki.jos.004864 CSTR:
Abstract:In this paper, a generalized motion blurring model is constructed from the viewpoint of optical flow. Then based on the model, forward motion blurring kernel is deduced. The kernel provides a theoretical foundation for forward motion deblurring of high speed railway from image sequences. A fast method is also designed to estimate forward motion blurring kernel on this theory. Three specific problems are solved in this process. First, the analytical solution under quick motion estimation method is obtained. Next, the analytical solution under quick motion estimation method of planar scene direction is achieved. Lastly, the numerical calculation algorithm of forward motion blurring kernel is developed. Experimental results validate the proposed method.
LUO Le , LIU Yi , QIAN De-Pei
2016, 27(8):2147-2167. DOI: 10.13328/j.cnki.jos.005103 CSTR:
Abstract:In the era of big data, systems need to process massive data efficiently to meet performance requirements of applications. In-memory computing technology can improve performance of massive data processing significantly by utilizing memory and avoid I/O operations. However, the technology faces a series of challenges that need to be solved. This paper analyzes characteristics of in-memory computing technology, lays out its classification, and introduces principles, related works and hot topics of every category. In addition, several typical applications of in-memory technology are introduced. Finally, challenges and opportunities for in-memory computing are elaborated.
LIU Ying , HUANG Lei , Lü Fang , CUI Hui-Min , WANG Lei , FENG Xiao-Bing
2016, 27(8):2168-2184. DOI: 10.13328/j.cnki.jos.005104 CSTR:
Abstract:With the rapid development of heterogeneous system, it's important to enhance data locality and fully utilize on-chip cache via compiler. However, classic reuse distance criteria exhibites platform-sensitive attribute in heterogeneous systems, therefore a unified reused distance calculation framework is needed for compiler to describe and optimize data locality. This paper proposes relaxed reuse distance with a unified calculation method in OpenCL programs as criteria for data layout optimization. Relaxed reuse distance is calculated with heterogeneous execution models and statistical approximation. Experiments are conducted on Intel Xeon Phi, AMD Opteron CPU, and Tilera Tile-GX36, and results show that this optimization can achieve at least 1.23x speedup on average.