HU Wen-Jun , WANG Shi-Tong , WANG Juan , YAN Qi-Sheng
2012, 23(12):3059-3073. DOI: 10.3724/SP.J.1001.2012.04209
Abstract:The L2-kernel classifier does not consider explicitly its classification margin when approximating the difference of densities (DoD) with the integrated squared error (ISE) criterion of probability densities, which is disadvantageous for improving the performance of classifiers to a certain extent. Its weights can simply be obtained by solving the corresponding QP problem which results in the comparatively slow training speed and is impractical especially for large datasets. With the aim of overcoming the above drawbacks, a new classification method is proposed in this paper, called the maximum margin logistic vector machine (MMLVM), which maximizes the DoDbased classification margin and finds the corresponding weight vector by solving a logistic optimization problem in gradient descent way. The theoretical analysis is provided in the globally optimal weights, the generalization error bound, and in the computational complexity of MMLVM. Experimental results on the artificial, UCI, PIE and USPS data sets demonstrate the effectiveness of the proposed approach in overcoming the drawbacks as above.
2012, 23(12):3074-3087. DOI: 10.3724/SP.J.1001.2012.04212
Abstract:The concept of n-valued modal model for multi-value modal logics is introduced in this paper, and the corresponding semantics are constructed. The study points out this kind of semantics and generalizes the semantics for classical modal logics. The definition of 〈W,R〉n-typed frame is presented, under which the localized mappings induced by modal formulae are constructed, and the concept of localized truth degree for modal formulae is introduced. It is obtained that the localized truth degree for any modal formula can be computed as the one for some modal formula without modalities in the same possible world. Based on these, the concept of global truth degree for modal formulae is introduced. It has been shown that whenever a modal formula contains no modalities, its global truth degree coincides with its truth degree in the common propositional logics.
LIU Le-Mao , ZHAO Tie-Jun , CAO Hai-Long , ZHU Cong-Hui , ZHANG Chun-Yue
2012, 23(12):3088-3100. DOI: 10.3724/SP.J.1001.2012.04207
Abstract:The partition ambiguity of translation derivations is an important problem suffered by the statistical machine translation, and it is much more important in a hierarchical phrase-based machine translation. In the paper, a hierarchical partition model is proposed to address the problem. The study applies markov random fields to construct the model, and integrate it into the hierarchical translation model to automatically select the more reasonable partition. In the NIST Chinese-English translation tasks, the optimization of the model is very efficient, and it improves the translation performance for hierarchical phrase-based translation on NIST05, NIST06 and NIST08 test sets.
LIU Shu-Jie , LI Chi-Ho , LI Mu , ZHOU Ming
2012, 23(12):3101-3114. DOI: 10.3724/SP.J.1001.2012.04208
Abstract:In this paper, based on the investigation of domain adaptation for feature weight, the study proposes to use a co-training framework to handle domain adaptation for feature weight, i.e. The study uses the translation results from another heterogeneous decoder as pseudo references and adds them to the development data set for minimum error rate training to bias the feature weight to the domain of test data set. Furthermore, the study uses a minimum Bayes- Risk combination for pseudo reference selection, which can pick proper translation results from the translation candidates from both decoders to smooth the training process. Experimental results show that this co-training method with a minimum Bayes-Risk combination can yield significant improvements in target domain.
Mairehaba·AILI , JIANG Wen-Bin , WANG Zhi-Yang , Tuergen·YIBULAYIN , LIU Qun
2012, 23(12):3115-3129. DOI: 10.3724/SP.J.1001.2012.04205
Abstract:Uyghur is a typical agglutinative language. It has a strong derivational ability with very a rich morphological structure and follows a harmonious rule. In the formation process, some phenomena may occur such as weakened, increased tone and fallen tone. The specific character of Uyghur language determines the difficulty of the Uyghur morphological analysis, including stemming and restoring the changed letter and POS tagging. This paper employs the hierarchical structure of Uyghur word, and proposes a directed graph model for Uyghur morphological analysis. In this model, words and tags are described as a directed graph. In this graph, nodes represent stems, affixes and their corresponding tags, while edges represent the transition, or general probabilities between nodes. Aimed at providing some light on the phenomenon of morphological sandhi in Uyghur language, this paper also proposes a restore model by changing the word to its original form. With the assumption that one letter can be changed to any letter, this model converts restoring problem into a sequence labeling problem, which could be solved by statistical methods. Experiment results on "Mega-words Corpus of Morphological Analysis of Uyghur", which is manually annotated by Xinjiang multilingual key laboratory shows that the accuracy of stemming reaches 94.7%, and the F score of stem and affix in line with tag reaches 92.6%.
SHAO Kun , LUO Fei , MEI Niao-Xiong , LIU Zong-Tian
2012, 23(12):3130-3148. DOI: 10.3724/SP.J.1001.2012.04204
Abstract:On the basis of a continuous process, the recommendation trust model is built for depicting indirect trust, the most complicated trust relationship. The model is also significant for the security and reliability of an open environment and open systems. After the factors influencing indirect trust are quantified, the filtrated recommendations are regarded as samples from the normal process, and the Bayesian estimation value of posteriori distribution expectation is obtained from calculation. Next, after an elaborate discussion of the trust evolution and the relationship between trust value and trustworthiness along with some propositions and proofs are proposed. Experimental data show that with the capacity of resisting malicious attacks improved, the model gives a more effective and precise result, which is also consistent with mathematical deduction.
ZHANG Yi-Chi , PANG Jian-Min , ZHAO Rong-Cai
2012, 23(12):3149-3160. DOI: 10.3724/SP.J.1001.2012.04221
Abstract:Considering the fact that the determination of the executable file maliciousness is hard to achieve, an approach based on the evidence theory is presented in this paper. First, a model for determining the maliciousness is established. Then characters compromising security are extracted to construct the set of program behaviors through decompiling the program. The model is trained using the BP neural network to gain the basic probability assignment functions (BPAF) of each behavior, and the weighted sum method is applied to combine the program behaviors, determining the executable file maliciousness. Experimental results demonstrate the validity of the approach which uses the evidence theory to determine the maliciousness of program.
QIAN Quan , XIAO Chao-Jie , ZHANG Rui
2012, 23(12):3161-3174. DOI: 10.3724/SP.J.1001.2012.04186
Abstract:Depending on the structured peer-to-peer networks, the P2P botnets are the main threats of the Internet in the future. In this paper, a formal mathematical P2P botnet propagation model is built based on a deep analysis of two typical structured P2P protocols, Chord and Kademlia. This model, it integrates different factors, such as structured P2P protocols, two-factor immunizations, and host online rates to describe the structured P2P botnet propagation mechanisms comprehensively. Meanwhile, in order to evaluate the model effectiveness, simulate the P2P botnets propagation, the difference between the theoretical model and the simulation is used to verify the model efficiency. The experiments prove the correctness of the theoretical model also verify the different influences of structured P2P protocols, immunization mechanisms, and the host online rates on P2P botnet propagation. Moreover, through simulating P2P network with millions of nodes, it can be shown that the propagation model is correct and valid in large scale network, which provides a theoretical basis for botnet detection and prevention.
RAO Jun , WU Bin , DONG Yu-Xiao
2012, 23(12):3175-3186. DOI: 10.3724/SP.J.1001.2012.04206
Abstract:To apply link prediction methods into large-scale complex network, this paper designs and implements a parallel link prediction algorithm based on MapReduce, which includes nine similarity Indices via local information. The parallel link prediction algorithm has a time complexity of O(N) in sparse networks. First, the paper verifies the validity of the algorithm on public datasets, increase in the extraction factor, recall ascends, and precision descends. The experimental results on ten large-scale datasets of variety network types show that the parallel link prediction algorithm is more effective than traditional ones, and its running time decreases with more compute units. The upper and lower bounds of AUC (area under a receiver operating characteristic curve) are proposed. The experimental results show the median of the upper and lower bounds are close to the real value of AUC, which focuses on whether prediction score is zero rather than the actual score value. The network average clustering coefficient has the greatest impact on AUC among most topological features and AUC rises as the network average clustering coefficient increases.
ZHOU Guo-Qiang , ZENG Qing-Kai
2012, 23(12):3187-3197. DOI: 10.3724/SP.J.1001.2012.04180
Abstract:To address the recommendation problems from malicious entities, a role separation based trust evaluation model (RSTrust) is proposed in this paper. In RSTrust, roles which entities act during trust evaluation are classified into transaction roles and recommendation roles. Trust on entity is therefore described as transaction trust and recommendation trust according to their associated roles, which leads to the separation of interference between different roles on different trusts. During the calculation of the global trust for an entity, the global recommendation trust of a recommender is used as a trust weight in RSTrust, and the disturbance of recommendation from malicious entities on global trust can be eliminated effectively. Analysis and simulation results demonstrate that RSTrust model has the fine feature of anti-malicious recommendation and good astringency.
NI Wei-Wei , ZHANG Yong , HUANG Mao-Feng , CHONG Zhi-Hong , HE Yu-Zhi
2012, 23(12):3198-3208. DOI: 10.3724/SP.J.1001.2012.04286
Abstract:Privacy-Preserving data publishing has attracted considerable research interest over the past few years. The principle difference of clustering and obfuscating burdens the trade-off between clustering utility maintaining and privacy protection. Most of existing methods such as adopting strategies of distance-preservation, or distribution-preservation, cannot accommodate both clustering utility and privacy security of the data. As a trade-off, a neighborhood-preservation based perturbing algorithm VecREP (vector equivalent replacing based perturbing method) is proposed, which realizes good clustering utility by maintaining the nearest neighborhood for each data point. The definition of a safe neighborhood is introduced to stabilize the composition of the nearest neighborhood. The equivalent replacing arc is generated to realize distribution stability of nearest neighborhood leveraging vector offset and composition. For each data point, VecREP randomly chooses a point on its equivalent replacing arc inside corresponding safe neighborhood to make substitution. The algorithm is compared with existing methods such as RBT, TDR, Camp-crest and NeNDS. Experimental results demonstrate that VecREP competes in performance with RBT on maintaining clustering quality and, outperforms the other. It can avoid a reversible attack effectively and compared to the existing solution, ARMM has a shorter handover delay and a smaller location update and delivery cost.
2012, 23(12):3209-3220. DOI: 10.3724/SP.J.1001.2012.04199
Abstract:In this paper, a sub-image method based on feature sampling and feature fusion (called as RS_SpCCA) is proposed. RS_SpCCA first performs a random subspace method in sub-images which are partitioned in a deterministic way. Then, the method obtains correlation features by fusing sampled features and global feature extracted by certain feature extraction method and finally, constructs component classifiers on corrleation features. In this method, the purpose of sampling feature is to construct more diverse component classifiers, and the purpose of the fusing feature is to make good use of the global information. The experimental results on AR, Yale and ORL three face image databases show that sub-image method based on feature sampling and feature fusion (RS_SpCCA) is superior to both SpCCA and Semi-RS which only use feature sampling or feature fusion.
ZHENG Yun-Ping , CHEN Chuan-Bo , LI Zu-Jia
2012, 23(12):3221-3232. DOI: 10.3724/SP.J.1001.2012.04236
Abstract:The idea of an overlapping rectangular region coding of binary images inspired the overlapping rectangular non-symmetry, the anti-packing pattern representation model (RNAM), and the extended Gouraud shading approach. A novel lossy gray image representation algorithm based on the overlapping RNAM, which is called ORNAMC representation algorithm, is proposed. Also, the four principles for anti-packing the overlapping homogenous blocks are presented in this paper. In the proposed ORNAMC representation algorithm, the wrong decoding problem of the matrix R for the overlapping RNAM representation of gray images is solved separately by using the horizontal, vertical, and isolated matrices, i.e.,H, V and I. These are used to identify the vertex types instead of using a single hybrid matrix, i.e., R. In addition, by redefining the codeword set for the three vertices symbols, this paper proposed a new coordinate data compression algorithm for coding the coordinates of all non-zone elements in the three matrices H, V and I. By taking some idiomatic standard gray images in the field of image processing as typical test objects, and by comparing the proposed ORNAMC representation algorithm with the latest non-overlapping RNAMC, the experimental results show that the former has a higher compression ratio and a fewer number of blocks and yet, maintains image quality. Therefore, a better method is to represent the gray image.