WU Hui-Jia , ZHANG Jia-Jun , ZONG Cheng-Qing
2016, 27(11):2691-2700. DOI: 10.13328/j.cnki.jos.004873 CSTR:
Abstract:As a sub-task for combinatory categorical grammar (CCG) based parsing, categorical tagging can improve parsing efficiency and accuracy. While traditional maximum entropy model solves this problem by designing meaningful feature templates, neural network can extract features automatically based on distributed representations. This paper proposes a neural categorical tagging model with two improvements. First, word embedding layer is extended with a part-of-speech embedding layer and a category embedding layer, which facilitates learning their distributed representations jointly by the back-propagation algorithm. Secondly, a beam search is used in the decoding to capture the dependencies among tags. These two improvements make the proposed model more accurate than the state-of-art maximum entropy based tagger (up to 1%).
LI Hong-Bo , LIANG Yan-Chun , LI Zhan-Shan
2016, 27(11):2701-2711. DOI: 10.13328/j.cnki.jos.004874 CSTR:
Abstract:Generalized arc consistency (GAC) plays a central role in solving the constraint satisfaction problem. Table constraints can theoretically represent all kinds of constraint relations, and many algorithms have been proposed to establish GAC on table constraints in the past decade. Among these methods, the simple tabular reduction algorithms (STR) are very efficient. However, the existing STR algorithms are suitable for only positive table constraints. They can not directly work on negative table constraints. In this paper, a STR algorithm, called STR-N, is first proposed to directly work on negative table constraints. Then, two improved versions of STR-N, STR-N2 and STR-NIC are presented. Experimental results show that the STR-N algorithms bring improvements over CPU time while solving the instances with negative table constraints.
2016, 27(11):2712-2724. DOI: 10.13328/j.cnki.jos.004877 CSTR:
Abstract:Belief propagation algorithm is very effective in finding satisfying assignments for RB(k,n,α,rc,p) model instances where hard region becomes narrower. However, belief propagation algorithm does not always converge for factor graphs with cycles. Unfortunately, rigorous theoretical proof of this phenomenon is still lacking. Belief propagation algorithm is the most basic message passing algorithms. This study derives the convergence conditions of the belief propagation algorithm for solving RB(k,n,α,rc,p) model instances. In the RB(k,n,α,rc,p) model with k=2, α>(1/k), rc>0 and which proves that BP will be converged with high probability if p∈(0,n-2α). Experimental results show that such convergence conditions of belief propagation algorithm are very effective in two different group data from the random RB(k,n,α,rc,p) model instances. In the RB(k,n,α,rc,p) model, when n increases, the experimental convergence interval is fixed range, while the theory convergence interval become narrower. It is because the number of incompatible assignment are determined by n and p in the RB(k,n,α,rc,p).
YANG Jin-Feng , GUAN Yi , HE Bin , QU Chun-Yan , YU Qiu-Bin , LIU Ya-Xin , ZHAO Yong-Jie
2016, 27(11):2725-2746. DOI: 10.13328/j.cnki.jos.004880 CSTR:
Abstract:An electronic medical record (EMR) is a patient's individual medical record written by health care providers and stored in digital format in which much medical knowledge and information about patient's personal health conditions are kept. The construction of annotated corpus for named entities and entity relations on EMR is a primary and fundamental task for information extraction which plays important role in clinical decision support, practice of evidence-based medicine, and other medical applications. Based on survey of current research on corpus construction for named entities and entity relations on EMR, this research proposes an annotation scheme for named entities and entity relations on Chinese electronic medical records (CEMR) according to characteristics of the records. Under the supervision of physicians, a complete and detailed annotation specification on CEMR is formulated, and an annotated corpus with high agreement is constructed. The corpus comprises 992 medical text documents, and inter-annotator agreement (IAA) of named entity annotations and entity relation annotations attain 0.922 and 0.895, respectively. The work presented in this paper builds substantial foundations for the subsequent research on information extraction in CEMR.
HU Wen-Bin , WANG Huan , YAN Li-Ping , QIU Zhen-Yu , XIAO Lei , DU Bo
2016, 27(11):2747-2762. DOI: 10.13328/j.cnki.jos.004910 CSTR:
Abstract:In complicated social networks, discovering or predicting important events is significant. The theoretical framework and evaluation methods of link prediction offer an effective solution for detecting events in social networks. Most of the current research focuses on proposing different similarity indexes to achieve higherlink prediction accuracy. However this type of approach has following problems:(1) Because different similarity indexes are designed for different networks, they are not universal; (2) The independent similarity index is difficult to reflect diversity and complexity of real network evolutions; (3) Without considering the fluctuation in the network evolution, the link prediction cannot detect events. To solve these problems, this paper proposes a swarm intelligence method based on mixed indexes (IndexEvent), which can evaluate fluctuations and detect events in social networks. The main work is as follow:(1) A proof is provided on the proposed mixed indexes that the link prediction algorithm based on mixed indexes can achieve a higher accuracy; (2) Based on the quantum-behaved particle swarm algorithm, an optimal weight algorithm (OWA) is developed to determine best mixed indexes for different networks efficiently; (3) A fluctuation detection algorithm (FDA) is designed to quantitatively estimates fluctuations in network evolutions at different periods. And micro factors are taken into account to improve FDA. The results of the experiments show that IndexEvent can effectively reflect evolution fluctuations and detect events.
2016, 27(11):2763-2776. DOI: 10.13328/j.cnki.jos.004911 CSTR:
Abstract:To tackle the problem that traditional clonal selection algorithm may suffer from premature convergence phenomenon and is lack of crossover operator problems, this paper proposes a new efficient clonal annealing optimization algorithm. The proposed algorithm combines simulated annealing algorithm with clonal selection mechanism of immune system, and maintains the balance of global and local search. The algorithm can effectively improve search efficiency, so as to speed up the convergence rate. Meanwhile, a quality factor model is used to analyze the dynamic performance of the algorithm, and an analysis of its convergence is performed using Markov chain theory. Finally, the proposed algorithm is applied to the association rule data mining, achieving satisfactory results.
LI Pu , JIANG Yun-Cheng , WANG Ju
2016, 27(11):2777-2795. DOI: 10.13328/j.cnki.jos.004920 CSTR:
Abstract:In this paper, the current research progresses of ontology reuse is reviewed and the issue that current ontology reuse algorithms can merely be applied to a single independent ontology is addressed. Focusing on the modular ontologies with ε-Connections language, the IKMO (integrity of knowledge about the module in an ontology) is presented based on the theory of conservative extension. The related properties of IKMO are proved. Further, an algorithm for the ontology reuse with the conservative extension ERMMO (extracting reused modules from modular ontologies) is provided. The features and conditions of two sub-algorithms of ERMMO are discussed. Lastly, the feasibility and soundness of ERMMO are analyzed and verified. ERMMO is a generalization of the current reuse algorithms based on conservative extension theory, and can be served as the guidelines for reuse of modular ontologies.
HANG Wen-Long , JIANG Yi-Zhang , LIU Jie-Fang , WANG Shi-Tong
2016, 27(11):2796-2813. DOI: 10.13328/j.cnki.jos.004921 CSTR:
Abstract:The main limitation of most traditional clustering methods is that they cannot effectively deal with the insufficient datasets in target domain. It is desirable to develop new cluster algorithms which can leverage useful information in the source domain to guide the clustering performance in the target domain so that appropriate number of clusters and high quality clustering result can be obtained in this situation. In this paper, a clustering algorithm called transfer affinity propagation (TAP) is proposed for the insufficient dataset scenarios. The new algorithm improves the clustering performance when the distribution of source and target domains are similar. The basic idea of TAP is to modify the update rules about two message propagations, used in affinity propagation (AP), through leveraging statistical property and geometric structure together. With the corresponding factor graph, TAP indeed can be applied to cluster in AP-like transfer learning, i.e., TAP can abstract the knowledge of source domains through the two tricks to enhance the learning of target domain even if the data in the current scene are not adequate. Extensive experiments demonstrate that the proposed algorithm outperforms traditional algorithms in situations of insufficient data.
ZHANG Hai-Feng , LIU Dang-Yi , LI Wen-Xin
2016, 27(11):2814-2827. DOI: 10.13328/j.cnki.jos.004928 CSTR:
Abstract:General game playing (GGP) is a research field for improving the general gaming intelligence of machines. It is different from specific game playing in that GGP players do not know game rules before the game begins, which makes them independent from human experience on specific games. Until now, GGP researchers have deeply explored many problems, such as game representation, search algorithm, and state evaluation. Also, some efforts have been made on knowledge transfer. The progress of GGP research represents the development of artificial general intelligence to some extent, which makes it remarkable.
GONG Qi-Yuan , YANG Ming , LUO Jun-Zhou
2016, 27(11):2828-2842. DOI: 10.13328/j.cnki.jos.005099 CSTR:
Abstract:When publishing datasets that contain relational and transaction attributes, referred to as RT-data for briefness, either type of data may suffer from linking attacks. Anonymizing both of them is essential. However, previous approaches suffer from huge information loss during anonymizing RT-data, and they fail to preserve the utility of datasets. To address this problem, an anonymization model, (k,l)-diversity is proposed to ensure privacy by guaranteeing l-diversity on each equivalence class and k-anonymity on transaction data. In addition, two heuristic algorithms named APA and PAA, which anonymize RT-data in different orders, are also provided to achieve (k,l)-diversity. Extensive experiments based on real-world dataset show that APA and PAA outperform existing approaches in terms of execution time and information loss.
2016, 27(11):2843-2854. DOI: 10.13328/j.cnki.jos.004870 CSTR:
Abstract:Unlike previous works such as explicit consumption intent recognition research, this paper presents a method that uses user behavior analysis to automatically recognize the implicit consumption intent. Specifically, the proposed method recasts implicit consumption intent recognition as a multi-label classification problem, which combines multiple features based on follower's behavior, intent behavior, retweets behavior, and user profiles. The paper proposes a method for the automatic extraction of a large user linkage across social media. With the proposed method, more 120000 user linkage pairs are extracted. Experimental results show that the multi-label classification-based method is effective for implicit intent recognition. Especially, the exploited features are all helpful for improving the recognition performance.
JIN Rui , ZHANG Hong-Li , ZHANG Yue , WANG Xing
2016, 27(11):2855-2869. DOI: 10.13328/j.cnki.jos.004932 CSTR:
Abstract:With the proliferation of the Chinese social network (especially the rise of weibo), the productivity and lifestyle of the country's society is more and more profoundly influenced by the Chinese internet public events. Due to the lack of the effective technical means, the efficiency of information processing is limited. This paper proposes a public event information entropy calculation method. First, a mathematical modeling of event information content is built. Then, multidimensional random variable information entropy of the public events is calculated based on Shannon information theory. Furthermore, a new technical index of quantitative analysis to the internet public events is put forward, laying out a foundation for further research work.
LU Gang , YU Xiang-Zhan , ZHANG Hong-Li , GUO Rong-Hua
2016, 27(11):2870-2883. DOI: 10.13328/j.cnki.jos.004885 CSTR:
Abstract:Traffic classification is the basis and key for optimizing network quality of service. Machine learning algorithms apply flow statistics in traffic classification, which are significant for identifying both encrypted and private traffic. However, the discriminator bias problem and the class imbalance problem are two main challenges in traffic classification. The discriminator bias problem denotes that some flow statistics can improve the accuracies for some applications but reduce the accuracies for other applications. The class imbalance problem denotes that machine learning based traffic classifier identifies the minority application with a low accuracy. To address the above two issues, traffic classification framework based on ensemble clustering (TCFEC) is proposed in this paper. TCFEC is composed of several base classifiers trained by clustering in different feature subspaces and an optimal decision component. It is able to improve accuracy in traffic classification. Specifically, compared with the traffic classifier based on traditional machine learning algorithms, TCFEC improves average flow accuracy by 5% as well as average byte accuracy by 6%.
CHEN Hu , HU Yu-Pu , LIAN Zhi-Zhu , JIA Hui-Wen
2016, 27(11):2884-2897. DOI: 10.13328/j.cnki.jos.004884 CSTR:
Abstract:A certificateless encryption scheme from lattices is put forward by using preimage sampleable algorithm to extract partial private keys and learning with errors to generate secret values and public keys. The new scheme is indistinguishably secure against adaptive chosen-identity attacks, even against quantum-computing attacks. This is achieved in the random oracle model by formally demonstrating that this construction can fight against two types of adversaries who can request secret values. Proper parameter setting for the scheme is obtained specifically by performing an analysis of its correctness, security, and efficiency. Two methods for further improving its efficiency are used by enlarging its plaintext space according to both distinct approaches, which also shows that the given scheme is flexible. Specially, an efficient method of successive padding with fixed bit is presented for obtaining multiple longer bit strings determined by a fixed-size bit string, which provides a valuable contribution towards building the multi-bit certificateless encryption scheme. Due to advantages inheriting from lattices and certificateless cryptosystem, the proposed schemes are flexible, efficient, resistant to quantum-computing attacks and free from certificate management.
ZHOU Yan-Wei , YANG Bo , WANG Qing-Long
2016, 27(11):2898-2911. DOI: 10.13328/j.cnki.jos.004941 CSTR:
Abstract:A hybrid signcryption scheme should withstand various leakage attacks when applied in practical applications. This paper presents a new leakage-resilient certificateless hybrid signcryption (LR-CLHS) scheme without bilinear pairing. The security of this scheme is based on the computational Diffie-Hellman (CDH) assumption and discrete logarithm (DL) problem. Considering the computational costs, the proposal is more efficient than traditional certificateless hybrid signcryption schemes and has a short ciphertext length and high security. In the random oracle model, it is also indistinguishability against adaptive posteriori key-leakage chosenciphertext attacks (IND-KL-CCA2) according to the hardness of the CDH assumption, existentially unforgeable against key-leakage chosen-message attacks (EUF-KL-CMA) according to the hardness of the DL problem, and maintains the original security under the condition that the adversary learns a small amount of leakage about the secret key by the leakage attacks (e.g., side-channel attacks, etc).
TAN Zhen-Hua , YANG Guang-Ming , WANG Xing-Wei , CHENG Wei , NING Jing-Yu
2016, 27(11):2912-2928. DOI: 10.13328/j.cnki.jos.004943 CSTR:
Abstract:Cloud storage is a model of data storage where the digital data is stored in logical pools to share "data as a service (DaaS)" for cloud users. However, users have no absolute control of cloud data, and as a result, they are more and more concerned about cloud data security especially for confidential data. This paper focuses on how to protect confidential data on cloud, and presents a (k,n) threshold secret sharing scheme based on m-sphere principle. Distribution algorithms are designed based on features of dealer's information and cloud storage containers' identifications. Secret is transformed into an m-sphere central coordinates, and then into n shadow coordinates which are placed on the m-sphere surface and distributed into n cloud storage containers. Secret reconstruction algorithms are also designed along with a proof that any k (k=m+1) linear irreverent m-coordinates can reconstruct a unique m-sphere center. Simulations and analysis validate the proposed scheme can tolerate fake shadow attacks and collusion attacks, and cloud users have absolute control on secret key which needs no more management cost from cloud services. Performance analysis proves that the scheme can improves cloud users' control on cloud data, and it is correct and efficient on computation performance and storageproperty.
GUI Zhen-Wen , LIU Yue , CHEN Jing , WANG Yong-Tian
2016, 27(11):2929-2945. DOI: 10.13328/j.cnki.jos.004865 CSTR:
Abstract:Registration is a fundamental technology for augmented reality. In this paper, a registration approach is proposed to accurately track the natural scenes. The matching method of SURF (speeded up robust features) descriptor is first improved to keep the initial registration matrix validity. Then, effective online learning of the scenes is used to improve the registration accuracy. Lastly, the registration matrix of the previous frame is utilized to rapidly restore the lost key points and accelerate the speed of registration. Experimental results show that the proposed method can keep smooth tracking for video frames and maintain high accuracy of registration.
SONG Chuan-Ming , ZHAO Chang-Wei , LIU Dan , WANG Xiang-Hai
2016, 27(11):2946-2960. DOI: 10.13328/j.cnki.jos.004886 CSTR:
Abstract:Motion estimation is a coding technique to eliminate the temporal redundancy of video. However, state-of-the-art translational motion model is not able to efficiently represent objects' local non-rigid complex motion. To address the issue, an elastic motion estimation algorithm is developed in this paper based on modified Gauss-Newton method. The effect of initial iteration point is first analyzed on the result of the Gauss-Newton method, and a two bit-depth pixel based uniform search is used to predict the initial iteration point. Subsequently, it is found that different step size has obvious influence on the performance of the elastic motion estimation by both theoretical and experimental analyses. The ratio of low-frequency energy of discrete cosine transform is employed to estimate the upper bound of the step size which is then refined by the golden ratio method. Experimental results show that the proposed algorithm is able to obtain stable performance for video sequences with various scene characteristics. It gains 1.73dB and 1.42dB higher average motion-compensated peak signal-to-noise ratio (PSNR) than those of the full search algorithm based on block-wise translational model and conventional elastic motion estimation, respectively. Furthermore, the proposed algorithm has faster convergence speed. Only 1~3 iterations are needed before the proposed algorithm achieves higher PSNR than conventional elastic motion estimation and block-wise translational full search method.
JIANG Wen-Tao , LIU Wan-Jun , YUAN Heng , ZHANG Hai-Tao
2016, 27(11):2961-2984. DOI: 10.13328/j.cnki.jos.004931 CSTR:
Abstract:An approach to object tracking based on vision quantum is proposed in this paper in order to solve the high loss-tracking rate in variable structure object tracking. First, the gray information is detected in an image from top to bottom with vision quantum, and the distribution area and gray levels of larger probability density are counted in the vision quantum. Then all the energy frequencies of the visual quantum are calculated such that the weaker energy frequency gradient is removed by filtration and the stronger frequency gradient of vision quantum that the distribution of high frequency information account for half quantum area is reserved. The quantum cluster is composed of vision quantum with the same frequency variation. Finally, taking quantum cluster as candidate object information, the state of moving object is predicted with maximum likelihood estimation and the forecast results are served as moving reference position of vision quantum in the next frame. Further verification of the visual quantum balance state is made to ensure the effectiveness of object tracking. This method catches the point that the variable structure moving object has the feature of the energy frequency step invariance at the juncture pixels of the foreground and background. It can effectively overcome the changes in shape, scale and other factors that influence the moving object tracking, achieving lower loss-tracking rate and lower computational complexity by using independent and continuous visual quantum to describe the step invariant feature. Experimental results show that the proposed approach has good adaptability to variable structure tracking with real-time and robust tracking performance.