ZHOU Jun-Ping , LI Rui-Zhi , ZENG Zhi-Yong , YIN Ming-Hao
2016, 27(9):2185-2198. DOI: 10.13328/j.cnki.jos.004856 CSTR:
Abstract:#SMT problem is an extension of SMT problem. It needs to compute the number of satisfiable solutions for a given first-order logic formula. Now the problem has been widely applied in the compiler optimization, hardware design, software verification, and automated reasoning. With widespread application of #SMT problem, the design of #SMT solver for large-scale instances is needed. This work presents a design of an approximate solver (VolComputeWithLocalSearch) for solving large-scale #SMT instances. It adds the differential evolution algorithm into the existing exact solution algorithm for #SMT, and gives the approximate solution by calling the volume calculation tool qhull. The algorithm reduces the number of volume calculations by bunch rule and enumerates all regions with solutions by differential evolution algorithm. This paper also proves in theory that the solution of new algorithm is the lower bound of the exact solution, thus it can be used in software testing and other fields which only need to know the lower bound. The experimental results show that VolComputeWithLocalSearch solver is stable, fast, and has a good performance in high dimension problems.
YAN Li-Ping , HU Wen-Bin , WANG Huan , QIU Zhen-Yu , DU Bo
2016, 27(9):2199-2217. DOI: 10.13328/j.cnki.jos.005063 CSTR:
Abstract:In order to alleviate traffic congestion for vehicles in urban traffic networks, many researchers have studied how to utilize the traffic resources such as roads effectively to supply effective route selection strategies for vehicles. Most of the current researches mainly focus on optimizing the signal cycle of traffic lights, supplying the optimized route selection for individual vehicles, and dispersing vehicles on the alternative routes based on their historical driving data or through the traffic game between the information center and the vehicles. However, the above methods have not considered the personalized traffic demands of each vehicle, the route selection conflicts between vehicles, or even the dynamic and uncertain traffic conditions in urban road networks. To solve these problems, this paper proposes a dynamic and real-time route selection model in urban traffic networks (DR2SM), which incorporates the preference for the alternative routes and the real-time traffic conditions. Through mutual information exchange, each vehicle uses a self-adaptive learning algorithm (SALA) to play the congestion game with each other to reach Nash equilibrium.
HE Kun , YANG Chen-Kai , HUANG Meng-Long , HUANG Wen-Qi
2016, 27(9):2218-2229. DOI: 10.13328/j.cnki.jos.004848 CSTR:
Abstract:This paper proposes a Quasi-physical algorithm based on action space (QPAS) for an NP-hard global optimization problem-the circle packing problem with equilibrium constraints (CPPEC). The algorithm has important applications for the layout design of the satellite modules. A key issue in designing a good basin hopping strategy for CPPEC is how to find the most vacant areas such that the searching procedure can jump from a local minimum basin to a promising area. By borrowing the concept of "action space" proposed for the rectangular packing problem, the new algorithm approximates each circle as a rectangle and the irregular vacant areas are viewed approximately as regular rectangular areas. Consequently the most vacant areas can be found efficiently and accurately. In addition, three quasi-human strategies, namely early termination, coarse-to-fine and adaptive step length, are combined with the quasi-physical approach to speed up the potential energy descending process. Experiments are performed on 13 benchmark instances, and computational results demonstrate the high efficiency of the proposed approach. QPAS achieves the first or the second best results on most instances compared with other algorithms, and in some configurations, it has smaller container radius than the current best results. Meanwhile, QPAS obtains very small equilibrium deviations.
QIU Tian-Yu , SHEN Fu-Rao , ZHAO Jin-Xi
2016, 27(9):2230-2247. DOI: 10.13328/j.cnki.jos.005068 CSTR:
Abstract:Self-organizing incremental neural network (SOINN) is a two layered, competitive learning based neural network which is able to represent the topology structure of input data and cluster online non-stationary data without prior knowledge, and also robust to noise. The incremental nature of SOINN enables it to learn novel patterns from data stream without affecting previously learned patterns. In this respect, it is appropriate to expect that SOINN could serve as a general approach to unsupervised learning problems. With some modifications, SOINN could handle other kinds of learning tasks such as supervised learning, associative memory, pattern based reasoning and manifold learning as well. SOINN has been used in many kinds of applications including robotics, computer vision, expert systems, and anomaly detection. This paper presents a survey of its basic ideas, improvements and applications.
ZHAO Xue-Hua , YANG Bo , CHEN He-Chang
2016, 27(9):2248-2264. DOI: 10.13328/j.cnki.jos.004855 CSTR:
Abstract:Stochastic blockmodel (SBM) has become a research focus in the domains of machine learning, network oriented data mining and social network analysis since it can effectively model networks without prior knowledge about their structures. It is a major challenge to develop a fast learning algorithm for stochastic blockmodel that has the capability of effective model selection for large-scale network. This paper presents a refined stochastic blockmodel, named RSBM, and its fast parallel learning method named RFLA. The learning method combines MML criteria with CEMM algorithm to achieve parallel execution in evaluating the model and estimating parameters. This strategy can significantly reduce time complexity of learning process. The accuracy and speed of the learning method are validated against both artificial networks and real networks, and the method is also compared with current representative SBM learning algorithms. The experimental results show that the proposed algorithm is able to greatly improve the efficiency without degenerating the precision of learning process, which indicates it achieves the best tradeoff between accuracy and speed. Furthermore, the proposed model and algorithm demonstrate the best generalization ability in terms of link prediction.
LI Ming-Peng , GAO Hong , ZOU Zhao-Nian
2016, 27(9):2265-2277. DOI: 10.13328/j.cnki.jos.005044 CSTR:
Abstract:This paper focuses on maximum Steiner connected k-core query processing based on graph compression, and proposes a maximum Steiner connected k-core query preserving graph compression algorithm, SC. The correctness of querying based on SC algorithm is proved. Since maximum Steiner connected k-core query only requires a connected component which satisfies certain properties, graph compression algorithm TC is proposed to further compact the compressed graph into a tree. It is proved that querying based on the compacted tree is correct. A novel linear query processing algorithm which is able to query on the compacted tree without decompression is also introduced. Experiments on both real and synthetic datasets demonstrate that the compression algorithm could compress the original graph by 88% in average, and for denser graphs, the compression algorithm achieves better compression ratio, reducing the original graph by nearly 90%. Comparing with the query processing on original graphs, the query performance on compressed graphs is better, and in average, it could be 1 to 2 orders of magnitude times better.
LI Chen , SHEN De-Rong , ZHU Ming-Dong , KOU Yue , NIE Tie-Zheng , YU Ge
2016, 27(9):2278-2289. DOI: 10.13328/j.cnki.jos.005046 CSTR:
Abstract:Large amounts of content with location and time tags are generated every day on webs such as microblog, news, and group-buying. Thus, it is important to find top-k results that satisfy users' temporal and spatial requirements from the contents. In this paper, a novel kNN query (called ST-kNN query) processing approach is proposed for content with location and time tags. First, location variables and time variables of data objects are transformed via temporal & spatial similarity in order to map data objects to a new three-dimensional space. Next, the spatial similarity between two objects in the three-dimensional space is used to approximate the actual temporal & spatial similarity. Then, a new index called ST-Rtree is designed in this three-dimensional space. The index combines location variables & time variables, and ensures every object is traversed no more than once. At last, an exact kNN query algorithm is proposed. The region is determined by computing only once to find top-k results, which guarantees high-efficiency in the query processing. Experiments on large datasets demonstrate that the presented query processing approach is very efficient.
TANG Na , YE Xiao-Ping , TANG Yong , PENG Peng , DU Meng-Yuan
2016, 27(9):2290-2302. DOI: 10.13328/j.cnki.jos.004946 CSTR:
Abstract:Temporal data management is an extended field of data management, and the research on this field has theoretical significance and practical application value. The index of temporal data is crucial to temporal data management and thus is one of the hot research topics. First, based on the partial-order mathematical relationship, this paper proposes a data structure which can convert a temporal query processing of two-dimension into one-dimension. This structure can process all the temporal relationship operations efficiently. Secondly, this paper presents an indexing scheme called TempPartialIndex which combines temporal partial-order data structure into non-temporal XML index. In TempPartialIndex schema, the processing of query begins with mapping and filtering with the semantic and temporal constraints so that a large number of nodes will be filtered out and only a few nodes will be left. Then a structural join algorithm is executed on the left nodes. Furthermore, the paper discusses the query based on set-at-a-time, the temporal variable query and modification. Simulations show that TempPartialIndex can process the temporal XML query and update efficiently.
SUN Chen-Chen , SHEN De-Rong , KOU Yue , NIE Tie-Zheng , YU Ge
2016, 27(9):2303-2319. DOI: 10.13328/j.cnki.jos.005043 CSTR:
Abstract:Entity resolution (ER) is a key aspect of data quality and is necessary for big data processing. Existing ER research focuses on data object similarity algorithms, blocking and supervised ER technologies, but pays little attention to matching decision problems in unsupervised ER. This paper proposes a clustering algorithm for ER to complement existing work. The algorithm builds a weighted similarity graph with data objects and their pairwise similarities. During clustering, the similarity between a cluster and a vertex is dynamically computed via random walk with restarts on the similarity graph. The basic logic behind clustering is that a cluster absorbs the nearest neighbor vertex iteratively. A data object ordering method is also proposed to optimize clustering order, promoting clustering accuracy. Further, an improved computation method of random walk's stationary probability distribution is proposed to reduce cost of the clustering algorithm. The evaluation on real datasets and synthetic datasets validates effectiveness of the proposed algorithm.
WANG Song , HUANG Hao , YU Guo , LIANG Nan , WANG Li-Wei , SUN Yue-Ming
2016, 27(9):2320-2331. DOI: 10.13328/j.cnki.jos.004872 CSTR:
Abstract:Rare category detection aims at finding at least one data example for each class in an unlabeled data set to prove the existence of these classes, especially the rare classes (a.k.a. rare categories) that have only a few data examples. It has various applications in the fields like financial fraud detection and network intrusion detection. Nevertheless, the existing approaches to this problem suffer either in terms of time complexity or the requirements for prior information about data sets (e.g., the proportion of data examples in each class). In this paper, a prior-free and efficient algorithm, called KRED is proposed for rare category detection. The algorithm explores the changes on local data distribution caused by the presence of the compact clusters of rare classes. To this end, it transforms a data set into a k-nearest neighbor graph, and investigates the variations in both edge lengths and in-degrees between the nodes. Finally, nodes with the maximal variations are selected as the candidate data examples of rare classes. Experimental results show that KRED effectively improves the efficiency of discovering new classes in data sets, and notably reduces the execution time.
MA Qian , GU Yu , LI Fang-Fang , YU Ge
2016, 27(9):2332-2347. DOI: 10.13328/j.cnki.jos.005045 CSTR:
Abstract:In recent years, it is recognized that sensing data is growing explosively with widespread use of sensing network. Due to the inherent hardware limitation, the randomness of distribution environment and unconscious errors during data processing, a deluge of missing values are mingled in original sensing data. Thus, imputing the missing values is essential because most of the existed analysis tools are not competent to the data sets containing missing values. So far, there have been many missing data imputation algorithms, however the accuracy of these algorithms is difficult to be guaranteed in the scenario of lumped missing data. Besides, these existing algorithms don't take the imputation order which influences the imputation accuracy into consideration. To address the above issues, this paper proposes an order-sensitive missing value imputation framework called OMSMVI for multi-source sensory data. OMSMVI takes advantages of multi-dimensions relevancy, such as temporal relevancy, spatial relevancy and attributive relevancy of sensing data adequately. The missing-sources-centered similarity graphs are constructed based on multi-dimensions relevancy. At the same time, in the process of missing data imputation, the imputed missing values are used as observations to impute subsequent missing values. Taking the whole distribution of missing sources into consideration, the framework performs order-sensitive missing value imputation, meaning that the order of imputation is ascertained before applying the specific MVI (missing value imputation) methods. Order-sensitive imputation can remit the decrease of imputed result accuracy caused by the lower similarity between missing source and its neighbors when the missing sources are dense. Finally, a new neighborhood-based missing values imputation algorithm NI, which modifies the KNN imputation algorithm, is introduced into the OMSMVI framework. NI uses the multi-dimension similarity to search the missing sources' neighbors which reflect the similarity from multiple dimensions. Such NI algorithm overcomes the shortcoming that parameter K of KNN is difficult to determine. Furthermore, NI algorithm can improve the imputation accuracy further compared to KNN. Two true sensor data sets are used to compare with the baseline MVI methods to verify the accuracy and effectiveness of OMSMVI.
ZHANG Wei-Wei , GONG Jian , LIU Qian , LIU Shang-Dong , HU Xiao-Yan
2016, 27(9):2348-2364. DOI: 10.13328/j.cnki.jos.004913 CSTR:
Abstract:Detecting malicious services via inspecting the content of DNS packets is a common way to network security monitoring. Such a work often requires quasi real time ability to find suspects among the huge collected domain names, which is costly in processing resources. This work proposes a lightweight algorithm based on the morpheme features (root, affix, Chinese spelling and special noun abbreviation) of domain names to quickly identify the suspects for targeted DPI detection. Compared with algorithms based on n-tuple frequency distribution measurement, the proposed one is proved to have stronger anti-interference ability and better detection accuracy by 35.2% higher while only 58.3% memory overhead increasing. While compared with the methods based on word features, this lightweight algorithm can cut 64.8% of computation complexity and 2.6% memory overhead down with only 2.5% accuracy reduction.
XIE Yong , LIANG Wei , LI Ren-Fa , WU Ke-Shou , HONG Chao-Qun
2016, 27(9):2365-2376. DOI: 10.13328/j.cnki.jos.005064 CSTR:
Abstract:The application of Internet-of-vehicles technologies in automobiles drives the modern automobiles developing towards electronization, networking and integration. The rapid development of Internet-of-vehicle technologies causes the problem of rapidly increasing data volume and constrained bandwidth for in-vehicle CAN (controller area network) networks. To solve these problems, this research addresses the signal packing problem in the design of CAN network. First, the signal set is divided into signal clusters according to signals' period. Next, the clusters are sorted in order of increasing period. Then, combined with two presented bandwidth slack evaluation metrics, a signal clustering-based signal packing (CSP) algorithm is proposed to realize the optimization of bandwidth utilization. Finally, by comparing with the state-of-art algorithms, the optimism of CSP in improving the bandwidth utilization is verified. Comparing with the algorithms proposed in reference 7, 8 and 11, the obtained average and maximal optimization ratio in bandwidth utilization for CSP is between[0.5%, 6.4%] and[2.4%, 22.65%], respectively.
TANG Xiao-Lan , HONG Dong-Hui , CHEN Wen-Long , PU Ju-Hua
2016, 27(9):2377-2388. DOI: 10.13328/j.cnki.jos.005065 CSTR:
Abstract:The vast majority of existing studies on data distribution in vehicular networks utilize the mobile vehicular nodes as content carriers, however the high mobility, the limited resources and the security risks of vehicular nodes prohibit the performance improvements. This paper proposes a distributed storage scheme named DSS using bipartite graph matching for vehicular networks where roadside units are regarded as storage devices. Considering the content replication at roadside units is NP-complete, DSS tackles the problem by transforming it into a maximal matching problem of bipartite graph in which the left vertices stand for the requests from vehicular nodes and the right vertices represent the storage cells of roadside units. Hence the original problem is solved by Hungarian algorithm in polynomial time. In addition, a redundant content deletion algorithm overcomes the possible duplications generated from the problem transformation by checking the redundancy in the order of the response factors of different replicas. Experiments show that the DSS scheme achieves a high access ratio and keeps a short access delay with a small cost of network resources.
WEI Fu-Shan , MA Jian-Feng , LI Guang-Song , MA Chuan-Gui
2016, 27(9):2389-2399. DOI: 10.13328/j.cnki.jos.004861 CSTR:
Abstract:Three-party password authenticated key exchange (3PAKE) protocols allow two clients to establish a common session key via the help of an authentication server. Each client only needs to share a password with the server. The derived session key can be later used to achieve end-to-end secure communications. Most of the existing 3PAKE protocols are proven secure in the random oracle model. However, these protocols may turn out to be insecure in real applications when the random oracle function is instantiated with a concrete hash function. In this paper, an efficient 3PAKE protocol is proposed using smooth projective hash function based on ElGamal public key encryption. The security of the proposed protocol is conducted in the standard model under the DDH assumption. Compared with other related protocols, this protocol is quite efficient in terms of computation and communication costs under the same security assumption, and as a result, it is more suitable for large-scale end-to-end communication environments.
XIAO Da , YANG Lü-Yin , SUN Bin , ZHENG Shi-Hui
2016, 27(9):2400-2413. DOI: 10.13328/j.cnki.jos.004862 CSTR:
Abstract:The methods for supporting dynamic data updates and third-party audit are key factors that affect the practicality of existing provable data possession (PDP) schemes. This article proposes a secure and efficient PDP system called IDPA-MF-PDP for realistic cloud storage environments. The cost of auditing multiple files is dramatically reduced by a multiple-file PDP scheme based on the data update pattern of cloud storage. The requirement for users being online is reduced to the maximum extent by the implicit third-party audit framework and tamper-evident audit logs. The tripartite interaction protocol between the user, the cloud server and the implicit auditor combines MF-PDP with the implicit third-party audit framework. Theoretical analysis and experimental results show that IDPA-MF-PDP has equivalent security property with single-file PDP schemes and the audit log provides a trustworthy history record of audit results; IDPA-MF-PDP reduces the computation and communication overhead of data possession auditing from linear in the number of files to near constant.
WANG Jing-Lian , GONG Bin , LIU Hong , LI Shao-Hui
2016, 27(9):2414-2425. DOI: 10.13328/j.cnki.jos.004849 CSTR:
Abstract:Designed to provide pervasive access to distributed resources in parallel ways, heterogeneous scheduling is extensively applied in large-scale computing system for its high performance. Conventional real-time scheduling algorithms, however, often overlook energy-efficiency while focusing on stringent timing constraints. To engage in green heterogeneous computing, a reusable energy-aware cloud model is first presented via mathematical formulation and quantization of the system parameters such as dynamic voltage and frequency scaling (DVFS), and dynamic power management (DPM). In addition, multidisciplinary context for multi-objective global optimization meta-heuristic is proposed and accomplished based on the supercomputer hybrid architecture. Furthermore, some technological breakthroughs are achieved with respect to boundary conditions for different heterogeneous computing and cloud scheduling, and descriptions of real-time variation of scheduling indexes (stringent timing constraints and energy-efficiency). Extensive simulation experiments highlight higher efficacy and better scalability for the proposed approaches compared with the other three meta-heuristics; the overall improvements achieve 8%, 12% and 14% for high-dimension instances.
WANG En-Dong , NI Fan , CHEN Ji-Cheng , WANG Hong-Wei , TANG Shi-Bin
2016, 27(9):2426-2442. DOI: 10.13328/j.cnki.jos.004859 CSTR:
Abstract:Instruction prefetching technologies proposed for general purpose computer systems cannot meet the requirements of real- time systems. One of the most important issues is that cache content pollution caused by useless prefetching loses real-time tasks' WCET estimates. And a loose on WCET analysis degrades the schedulability of the system and in turn brings down its efficiency. A basic-block based instruction prefetching method is proposed in this paper. The method performs instruction prefetching at the basic block level, avoids useless prefetching, simplifies the instruction hit/miss classifications in the worst-case execution, and reduces the WCET estimates of real-time tasks. Real-time benchmark tests show that, the method can reduce real-time tasks' WCET estimates by 20% and also improve instruction cache access performance by 10% on average.
CHEN Zhi-Feng , LI Qing-Bao , ZHANG Ping , WANG Wei
2016, 27(9):2443-2458. DOI: 10.13328/j.cnki.jos.004875 CSTR:
Abstract:Kernel-level attacks are serious threat to the integrity and security of operating systems. Existing kernel integrity measurement methods are one-sided when selecting the measurement objects, as most of these methods suffer from periodic detection shortcoming that makes themselves vulnerable to TOC-TOU attacks. Besides, hardware-based kernel integrity measurement methods are usually too expensive, while hypervisor-based kernel integrity measurement methods are always likely to degrade system performance due to the introduction of complex VMMs. To address these problems, this study proposes a kernel integrity measurement approach based on memory forensics technique (KIMBMF). First, the static and dynamic measurement objects are extracted with the memory forensics technique, and a time random algorithm is presented to degrade the impact caused by TOC-TOU attacks. At the same time, a novel algorithm is also introduced by combining the Hash operation with cryptographic operation, thereby ensuring the security of the measurement progress. Next, a kernel integrity measurement prototype is implemented according to the above techniques and algorithms, and its effectiveness and overhead are evaluated. Experimental results show that KIMBMF can measure the integrity of operating system effectively, and has a reasonable time overhead.