YANG Ning , TANG Chang-Jie , WANG Yue , CHEN Yu , ZHENG Jiao-Ling
Abstract:To solve the problem of mining evolutionary events from multi-streams, this paper proposes a spectral clustering algorithm, SCAM (spectral clustering algorithm of multi-streams), to generate the clustering models of Multi-Streams. The similarity matrix in the clustering models of Multi-Streams are based on Coupling Degree, which measures the dynamic similarity between two streams. In addition, this paper also proposes an algorithm, EEMA (evolutionary events mining algorithm), to discover the evolutionary event points based on the drift of clustering models. EEMA takes the index of Clustering Model Quality as the optimization objective in determing the number of clusters automatically. The Clustering Model Quality combines the matrix perturbation theory and the Clustering Cohesion, which has a sound upper bound and is used to measure the compactness of a clustering model. Finally, this paper presents O-EEMA (optimized-EEMA) as the optimization of EEMA with the temporal complexity of O(cn2/2), and the results of extensive experiments on the synthetic and real data set show that EEMA and O-EEMA are effective and practicable.
LI Zhi-Qing , SHI Zhi-Ping , LI Zhi-Xin , SHI Zhong-Zhi
Abstract:Current existing sparse coding models employ the mean square of the error between the actual image and the reconstructed image to measure how well the code describes the image. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene or a video, an alternative measure for information preservation assessment, based on the structural similarity, is introduced. After minimizing the cost function, the improved model attains a complete family of localized, oriented, and bandpass receptive fields, similar to those found in the primary visual cortex. The experimental results show that the improved sparse coding model is more consistent in human visual system.
PAN Xiao-Ying , LIU Fang , JIAO Li-Cheng
Abstract:By using the density sensitive distance as the similarity measurement, an algorithm of Density Sensitive based Multi-Agent Evolutionary Clustering (DSMAEC), based on multi-agent evolution, is proposed in this paper. DSMAEC designs a new connection based encoding, and the clustering results can be obtained by the process of decoding directly. It does not require the number of clusters to be known beforehand and overcomes the dependence of the domain knowledge. Aim at solving the clustering problem, three effective evolutionary operators are designed for competition, cooperation, and self-learning of an agent. Some experiments about artificial data, UCI data, and synthetic texture images are tested. These results show that DSMAEC can confirm the number of clusters automatically, tackle the data with different structures, and satisfy the diverse clustering request.
Abstract:The lack of an efficient Web organization and management mechanism has become the bottleneck of Web applications. To address the problem, this paper proposes an extended concept lattice model based on Web input and output messages for service relationship mining. Concept coverage functionality is created and introduced into normal concept lattice. Off-Line algorithms for Web relations including equivalence, replacement, and flow relations mining, as well as their on-line update algorithms, are given. Experiment on real Web registry center shows that the extended concept lattice model is a highly efficient organization mechanism for Web relationships. Real Web registry center automatically mines, is expected to provide intelligent support for service choices, optimizations and composition.
Abstract:Rich unlabeled data contains valuable information, which is useful for classification. Using information efficiently to improve the accuracy of classification is the major purpose of semi-supervised learning. This paper proposes a kind of semi-supervised classification approach called Semi-Supervised Discriminant Analysis that is based on Manifold Distance, SSDA. The intra-class neighbors, the inter-class neighbors, and the total neighbors of a selected point can be determined by the proposed manifold distance. The similarity between these neighbors and the point can be defined based on the manifold distance. The object function is defined using the similarity. As the experiments operated on the database ORL and YALE show, compared with the existing algorithms, the proposed algorithm can improve the accuracy of classified algorithms based on distance. When dealing with nonlinear dimensionality reduction problem, the Kernel SSDA (namely, kernel semi-supervised discriminant analysis based on manifold distance) is proposed. Also, the experimental results show the efficiency of this algorithm.
LIN Zi-Yu , YANG Dong-Qing , WANG Teng-Jiao , ZHANG Dong-Zhan
Abstract:First, the research background of keyword search over relational databases is presented and is followed by a detailed description of two solutions to this problem, i.e., data graph based and schema graph based methods, and a discussion of the principles, advantages and disadvantages of these methods is also mentioned. Finally, some future trends in this area are discussed.
LIU Yong , LI Jian-Zhong , GAO Hong
Abstract:Many algorithms on subgraph mining have been proposed. However, the number of frequent subgraphs generated by these algorithms may be too large to be effectively explored by users, especially when the support threshold is low. In this paper, a new problem of mining frequent jump patterns from graph databases is proposed. Mining frequent jump patterns can dramatically reduce the number of output graph patterns and still capture interesting graph patterns. Futhermore, jump patterns are robust against noise and dynamic changes in data. However, this problem is challenging due to the underlying complexity associated with frequent subgraph mining as well as the absence of Apriori property for jump patterns. By exploring the properties of jump patterns, two novel effective pruning techniques are proposed: Internal-Extension-Based pruning and external-extension-based pruning. Based on the proposed pruning techniques, an efficient algorithm GraphJP is presented for this new problem. It has been theoretically proven that the novel pruning techniques and the proposed algorithm are correct. Extensive experimental results demonstrate that the novel pruning techniques are effective in pruning the unpromising parts of search space, and GraphJP is efficient and scalable in mining frequent jump patterns.
ZHANG Yan-Song , XIAO Yan-Qin , WANG Shan , CHEN Hong
Abstract:A what-if analysis can provide a more meaningful information than classical OLAP (on-line analysis processing). Multi-Scenario hypothesis upon historical data needs efficient what-if data view support. Two novel algorithms of deltaMap and pre-merge, which can greatly improve the performance of delta table algorithm with set operations, are proposed. To analyze the performance of query re-writing algorithm and delta cube algorithm under different what-if update conditions, a global performance analysis and comparison are presented in the experiment section. This paper proposes a cost model for a what-if analysis processing engine, based on different algorithms with parameters such as application scenario, what-if update rate, complexity of what-if updates, memory storage policy, cardinality of query result set etc, that can be used as a practical framework in a what-if analysis system.
CHEN Li-Fei , GUO Gong-De , JIANG Qing-Shan
Abstract:Soft subspace clustering is a key for high-dimensional data analysis. The existing algorithms usually require users to estimate some key global parameters in advance, and ignore the optimization of subspaces. A novel objective function, to be optimized by the soft subspace clustering algorithms, is proposed in this paper by taking into account both minimization of the compact subspace clusters and maximization of the subspaces in which the clusters exist. Based on this, a new locally feature weighting scheme is derived, and an adaptive algorithm for k-means type soft subspace clustering is presented. In the new algorithm, the optimal values of parameter are automatically computed, according with the dataset and its partitions. Experimental results carried out on some real-world and synthesis datasets demonstrate that the proposed method significantly improves the accuracy as well as the stability of the clustering results.
TANG Ming-Dong , ZHANG Guo-Qing , YANG Jing , ZHANG Guo-Qiang
Abstract:The Internet routing system is facing a serious scaling challenge due to the rapid growth of the global routing table. For the purpose of reducing routing table size, many studies have developed a lot of new routing solutions. After the paper introduces the background of the Internet routing system, a classification of new routing solutions is presented. Then, a typical scalable routing algorithms and architectures become the focus, and their basic ideas and characteristics are deeply analyzed and compared. Finally, some key issues and ideas for future research are discussed.
TIAN Ke , ZHANG Bao-Xian , MA Jian , YAO Zheng
Abstract:Opportunistic routing can largely improve the performance of wireless multihop networks by taking advantage of the broadcasting nature of wireless medium. In this paper, the basic idea behind opportunistic routing is introduced, and then the paper looks to classify existing work in this area based on different criteria. A comprehensive survey on typical opportunistic routing protocols is given. This paper introduces, in details, how each of these protocols work, and then the paper discusses about their merits and drawbacks. Finally, this paper concludes with some issues that still need to be resolved in this area in hopes of stimulating future research on this topic.
ZHANG Long , ZHOU Xian-Wei , WANG Jian-Ping , DENG Yu , WU Qi-Wu
Abstract:As newly proposed end-to-end, store-and-forward networks, delay and disruption tolerant networks (DTN) are characterized by intermittent connectivity, frequent partitions, extremely high latency, asymmetric data rates, high error rates, heterogeneous interconnection, etc. Hence, traditional routing protocols for Internet, mobile ad hoc networks and wireless sensor networks, are difficult to be applied efficiently in DTN scenarios, and routing in DTN faces many new challenges. After briefly describing the fundamental characteristics of DTN and the major challenges in designing routing protocols, the metrics used to evaluate the DTN routing protocols are proposed. Next, research on routing protocols for DTN is comprehensively summarized and deeply analyzed in this paper, including unicast routing, multicast routing, and anycast routing. Finally, the main routing protocols are compared, and the open issues for the future research are also pointed out to motivate new research and development in the field of DTN.
ZHU Ying-Wu , YANG Jia-Hai , ZHANG Jin-Xiang
Abstract:Due to the fact that the nature of network traffic is not fully and understood, large-scale, high-speed network traffic anomaly detection in an idea is a difficult problem to solve. According to the analysis of the network traffic structure and traffic information structure, it is found that in a certain range, the IP address and port distributions exhibit heavy tail and self-similar characteristics. The normal network traffic has a relatively stable structure. This structure corresponds to a more stable value of information entropy. Abnormal traffic and sample traffic of information entropy fluctuates by using the normal traffic as the center, and forms the structure of spatial information of IP, port, and IP number of active dimensions. Based on this discovery, the paper proposes a novel traffic classification algorithm, based on support vector machine (SVM) method, that transforms the traffic anomaly detection issue to a SVM-based classification decision issue. The experimental results not only evaluate its accuracy and efficiency, but also show its ability to detect on sampled traffic, which is very important for the traffic data reduction and efficient anomaly detection of high speed networks.
LIU Xin , ZHU Pei-Dong , PENG Yu-Xing
Abstract:In today’s Internet, it is very difficult for network operators to discover prefix hijacks in time. Considering the autonomous characteristic of the Internet inter-domain routing system, this paper provides the idea of collaborative monitoring among multiple Autonomous Systems (ASes). This paper also examines the design of a new method, named Co-Monitor that detects prefix hijacks in real-time. In Co-Monitor, every participant AS exchanges self-defined prefix-to-origin mapping information with the others, and they monitor local BGP (border gateway protocol) updates respectively. Once some participant discovers that the origin of information of a BGP route is inconsistent with the learned prefix-to-origin mapping information, it notifies relative participants immediately; thereby, Co-Monitor can help participants detect prefix hijacks quickly and effectively. This paper presents the detailed design of Co-Monitor, evaluates its detecting capabilities, and also discusses several related problems. The experimental results show that Co-Monitor, with only selected 60 participants, is accurate with 0% false negative ratio and 0% false positive ratio.
WANG Jie , WANG Jian-Xin , CHEN Jian-Er
Abstract:A fast and accurate generation of worm signatures is essential in efficiently defending worm propagation. Most of the recent signature generation approaches do not generate accurate signatures for polymorphic worms in environments with noise. In this paper, a CCSF (color coding signature finding) algorithm is presented to solve the problem of a polymorphic worm signature generation with noise by using color coding. In the CCSF algorithm, n sequences are divided into m group, and signatures for every group sequence are generated by color coding. After filtering all signatures, an accurate worm signature is generated. CCSF’s range of polymorphic worms is evaluated. When comparing CCSF with other existing approaches, CCSF shows a distinct advantages in generating accurate signatures for polymorphic worms in the presence of noise. Signatures generated do not contain fragments and can be used conveniently to detect polymorphic worms in IDS (intrusion detection system).
ZHANG Cheng , LIAO Jian-Xin , ZHU Xiao-Min
Abstract:Fault localization has theoretically been proven to be NP-hard. This paper takes a weighted bipartite graph, as fault propagation model, and proposes a heuristic fault localization algorithm based on Bayesian suspected degree (BSD) to reduce the computational complexity. It introduces a metric of BSD, which needs only to be calculated once, and uses incremental coverage, which makes the algorithm a low computation complexity O(|F|×|S|). Simulation results show that the algorithm has a high fault detection rate as well as low false positive rate and performs well even in the presence of unobserved and suspicious alarms. The algorithm, which has a polynomial computational complexity, can be applied to a large-scale communication network.
HUANG Yong-Sheng , MENG Xiang-Wu , ZHANG Yu-Jie
Abstract:Enhancing the efficiency of the file location is important in the study of unstructured P2P network. Flooding and random walks are simple and easily implemented. However, the former will increase the load of P2P network to much and put bounds to the search depth, and the latter’s lower network load and deeper search comes at the cost of lower search breadth and more response time. This paper puts forward a strategy of content location and routing of a search request in an unstructured P2P network. By applying the rationale of social network and simulating the ability of the peers of social network, the strategy proposed in this paper, can make better use of the ability of the peers and locate files faster with lower search depth and breadth. Supported by the equivalent hardware environment, the experimental results demonstrate that the time spent on content location can be reduced by more than 50%.
JING Ji-Wu , WANG Jing , LIN Jing-Qiang , XIE Yong-Quan , GU Qing
Abstract:Byzantine quorum systems (BQSs) provide attack-resilient storage services over asynchronous channels and tolerate f servers’ Byzantine failures. COCA (Cornell online certification authority) and CODEX (COrnell Data EXchange) have designed a server protocol to implement BQSs threshold signature schemes (TSS-BQSs), which can execute proactive recovery, conveniently, and can simplify the key management and communication of clients. Under the same system model and channel assumptions, a new server protocol that satisfies the security requirements of TSS-BQS is proposed. The proposed protocol involves less rounds of communication and performs better than that of COCA/CODEX in the case of concurrent read-write operations.
ZHANG Jin , WU Jiang-Xing , NIU Xiao-Na
Abstract:Fair packet sampling can obtain a higher packet sampling ratio of short flows by sacrificing the packet sampling of long ones; thus, ensuring better fairness among all flows than uniform random sampling does. However, the previously proposed fair sampling algorithm of Sketch Guided Sampling (SGS) has the drawbacks of poor space efficiency and large estimation error for short flows. In this paper, a space-efficient fair packet sampling (SEFS) algorithm is proposed. The key innovation of SEFS is a multi-resolution d-left hashing schema for flow traffic estimation. The performance of SEFS is compared to that of SGS in contexts of both flow traffic measurements and a long flow identification process that uses real-world traffic traces collected from OC-48 and OC-192 backbone network. The experimental results show that the proposed SEFS is more accurate than SGS in both application contexts, while a reduction of 65 percent in space complexity can be achieved. The improvement of estimation accuracy of SEFS is remarkable, especially for short flows, which comprise as past of a large percentage of whole network traffic flows.
Abstract:Network lifetime is a bottleneck that restricts the development of wireless sensor networks. One approach to save energy effectively and prolong network lifetime is to schedule some nodes to work and put other nodes into a low-powered sleep mode, while monitoring performance of network. The object of scheduling nodes is to obtain a minimum node set that can cover a monitored region. This is a NP-hard problem. Performances of present approximation algorithms have not been good. An approximation algorithm of a minimum cover set problem based on methodology is proposed. During the process of constructing a cover set, effective node that extends to maximal areas are selected to join the cover set. Theoretical analyses show that the algorithm can construct a cover set that perform well and has a time complexity that is O(n), where n is initial node number. Experimental results show that performance of this new algorithm outperform that of present algorithms. The size of a cover set is decreased by 14.2%. Also, execution time is less than that of present algorithms. When initial nodes are deployed densely, the average degree of coverage obtained by the algorithm is below 1.75 and has an approximation ratio below 1.45.
ZHANG Ke-Wang , PAN Yu , ZHANG Qiong , ZHANG De-Yun
Abstract:Collision avoidance and spatial reuse are two important approaches to improving the throughput of ad hoc networks, and many MAC protocols are proposed to achieve these goals. In most MAC protocols, collisions are reduced by solving the hidden terminal problems, but the spatial reuse remains un-optimized in these protocols, which affects network throughput dramatically. Moreover, reception of exposed terminals is not allowed in current MAC protocols even if they can receive packets successfully, which leads to lower spatial reuse. In this paper, a high throughput MAC protocol named e-MAC is proposed. To improve the network throughput, two approaches are used in e-MAC. First, a power controlled busy tone is used to eliminate hidden terminals. The receiver adjusts the transmission power of busy tone, according to a received signal strength from the transmitter, so that the spatial reused is optimized while all hidden terminals are covered by a busy tone. Second, exposed terminals are allowed to receive when the ratio between signal strength of RTS (ready-to-send) and interference satisfies the SINR (signal to interference and noise ratio) requirements, which further improves the spatial reuse. Simulation results show that the average throughput of e-MAC outperforms that of DUCHA (dual channel access) by 87%.
Abstract:Accessing data in a intermittently connected mobile ad hoc networks is a challenging problem that is caused by frequent network partitions due to node mobility and to impairments of wireless communications. This partitioning pattern is studied by examining the statistics of network partitions for a number of mobility models. This paper then establishes the relationship between the network partitioning pattern and the effectiveness of the data replication scheme, and then gives an upper bound of data availability, achieved by an ideal replication scheme under standard operational conditions. The data availability of a totally random scheme can achieve is also given. Based on these results, a novel replication scheme, RICMAN (replication in intermittently connected mobile ad hoc networks), which takes into account the fact that the network is often partitioned into smaller and uses only intermittent connectivity, thanks to mobile nodes traveling across partition, is proposed. In RICMAN, data items are replicated with the nodes using rather stable neighboring topology and with enough resources. A semi-probabilistic data disseminating protocol is employed to distribute the replicas and propagate the updates and can identify the potential mobile nodes traveling across partitions to maximize data delivery. To maintain replica consistency, a weak, eventual consistency model is utilized to ensure all updates eventually propagate to all replicas in a finite delay. Simulation results demonstrate that RICMAN scheme can achieve high data availability with low overhead. With optimized parameters, the data availability is just 10%~15% lower than the upper bound under certain network conditions.
Abstract:A novel ID-based channel reservation (IDBCR) multiple access protocol is presented for efficient channel sharing in ad hoc networks. Its flexibly employs request-to-send (RTS) and clear-to-send (CTS) dialogue on a common channel and selects a conflict-free traffic channel to accomplish the transmission of a data packet, based on an ID-based channel selection scheme. The acknowledgment (ACK) packet for the data packet transmission is replied to the sender over another common channel, which effectively eliminates the influence of an exposed terminal problem. Finally, the comparison of the proposed protocol with the CAM-MAC (cooperative asynchronous multi-channel MAC) protocol is provided, and simulation results show that the proposed protocol outperforms the CAM-MAC protocol on a total channel utilization, average channel utilization, and average packet delay.