ZHANG Hong-Yi , WANG Li-Wei , CHEN Yu-Xi
2013, 24(11):2476-2497. DOI: 10.3724/SP.J.1001.2013.04486 CSTR:
Abstract:Probabilistic graphical models are powerful tools for compactly representing complex probability distributions, efficiently computing (approximate) marginal and conditional distributions, and conveniently learning parameters and hyperparameters in probabilistic models. As a result, they have been widely used in applications that require some sort of automated probabilistic reasoning, such as computer vision and natural language processing, as a formal approach to deal with uncertainty. This paper surveys the basic concepts and key results of representation, inference and learning in probabilistic graphical models, and demonstrates their uses in two important probabilistic models. It also reviews some recent advances in speeding up classic approximate inference algorithms, followed by a discussion of promising research directions.
TAO Qing , GAO Qian-Kun , JIANG Ji-Yuan , CHU De-Jun
2013, 24(11):2498-2507. DOI: 10.3724/SP.J.1001.2013.04479 CSTR:
Abstract:Machine learning is facing a great challenge arising from the increasing scale of data. How to cope with the large-scale even huge-scale data is a key problem in the emerging area of statistical learning. Usually, there exist redundancy and sparsity in the training set of large-scale learning problems, and there are structural implications in the regularizer and loss function of a learning problem. If the gradient-type black-box methods are employed directly in batch settings, not only the large-scale problems cannot be solved but also the structural information implied by the machine learning cannot be exploited. Recently, the state-of-the-art scalable methods such as coordinate descent, online and stochastic algorithms, which are driven by the characteristics of machine learning, have become the dominant paradigms for large-scale problems. This paper focuses on L1-regularized problems and reviews some significant advances of these scalable algorithms.
WANG Xing , FANG Bin-Xing , ZHANG Hong-Li , HE Hui , ZHAO Lei
2013, 24(11):2508-2521. DOI: 10.3724/SP.J.1001.2013.04468 CSTR:
Abstract:Currently, there is some lack of knowledge about learning bound in relational classification model. In this study, some learning bounds for relational classification are proposed. First, two bounds are deduced for finite and infinite hypothesis space of the relational classification model respectively. Further, a complexity metric called relational dimension is proposed to measure the linking ability of the relational classification model. The relation between the complexity and growth function is proofed, and the learning bound for finite VC dimension and relational dimension is obtained. Afterwards, the condition of learnable, non-trivial, and the feasibility of the bound is analyzed. Finally, the learning progress of relational classification model based on Markov logic network is analyzed with some examples. The experimental result on a real dataset has demonstrated that the proposed bounds are useful in some practical problems.
HU Qing-Hui , DING Li-Xin , HE Jin-Rong
2013, 24(11):2522-2534. DOI: 10.3724/SP.J.1001.2013.04483 CSTR:
Abstract:Kernel method is an effective approach to solve the nonlinear pattern recognition problems in the field of machine learning. At present, multiple kernel method has become a new research focus. Compared with the traditional single kernel method, multiple kernel method is more flexible, more interpretable and has better generalization performance when dealing with heterogeneous, irregular and non-flat distribution samples. A multi-kernel S3VM optimization model based on Lp norm constraint is presented in this paper in accordance with kernel method of supervised learning. Such model has two sets of parameters including decision functions fm in reproducing kernel Hilbert space and weighted kernel combination coefficients, and inherits the non-smooth and non-convex properties from single-kernel based S3VM. A two-layer optimization procedure is adopted to optimize these two groups of parameters, and an improved Quasi-Newton method named subBFGS as well as a local search algorithm based on label switching in pair are used to solve non-smooth and non-convex problems respectively with respect to fm. Base kernels and manifold kernels are added into the multi-kernel framework to exploit the geometric properties of the data. Experimental results show that the proposed algorithm is effective and has excellent generation performance.
2013, 24(11):2535-2547. DOI: 10.3724/SP.J.1001.2013.04472 CSTR:
Abstract:Although granular support vector machine (GSVM) can improve the learning speed, the generalization performance may be decreased because the original data distribution will be changed inevitably by two reasons: (1) A granule is usually replaced by individual datum; (2) Granulation and learning are carried out in different spaces. To address this problem, this study presents a granular support vector regression (SVR) model based on dynamical granulation, namely DGSVR, by using the dynamical hierarchical granulation method. With DGSVR, the original data are mapped into the high-dimensional space by mercer kernel to reveal the distribution features implicit in original sample space, and the data are divided into some granules initially. Then, some granules are obtained with important regression information by measuring the distances of granules and regression hyperplane. By computing the radius and density of granules, the deep dynamical granulation process executes until there are no informational granules need to be granulated. Finally, those granules in different granulation levels are extracted and trained by SVR. The experimental results on benchmark function datasets and UCI regression datasets demonstrate that the DGSVR model can quickly finish the dynamical granulation process and is convergent. It concludes this model can improve the generalization performance and achieve high learning efficiency at the same time.
DING Shi-Fei , HUANG Hua-Juan , SHI Zhong-Zhi
2013, 24(11):2548-2557. DOI: 10.3724/SP.J.1001.2013.04475 CSTR:
Abstract:Smooth twin support vector machines (STWSVM) uses Sigmoid function to transform the unsmooth twin support vector machines (TWSVM) into smooth ones. However, because of the low approximation ability of Sigmoid function, the classification accuracy of STWSVM is unsatisfactory. Furthermore, similar to TWSVM, STWSVM is sensitive to the abnormal samples. In order to address the above problems, this paper introduces CHKS function, and proposes a smooth twin support vector machines, smooth CHKS twin support vector machines (SCTWSVM). In order to reduce the influence of abnormal samples on SCTWSVM, different importance are given for each training sample according to the sample point positions for SCTWSVM, resulting in weighted smooth CHKS twin support vector machines (WSCTWSVM). The study proves that SCTWSVM is not only strictly convex, but also can meet the arbitrary order smooth performance. Meanwhile, the experimental results show that SCTWSVM has better performance than STWSVM. Furthermore, the experimental results also show that WSCTWSVM is effective and feasible relative to SCTWSVM.
ZHAO Jia-Cheng , CUI Hui-Min , FENG Xiao-Bing
2013, 24(11):2558-2570. DOI: 10.3724/SP.J.1001.2013.04482 CSTR:
Abstract:Cloud computing and multi-core processors are emerging to dominate the landscape of computing today. However, in terms of computing resources, the utilization of modern datacenters is rather low because of the potential negative and unpredictable cross-core performance interference. To provide QoS guarantees for some key applications, co-locations of such applications are disabled, causing computing resource overprovisioning. Therefore precise analysis for cross-core interference is a key challenge for improving resource utilization in datacenters. This study is motivated by the observation that the performance degradation of one application suffered from cross-core interference can be represented as a piecewise function of the aggregate pressures on memory subsystem from all cores, regardless of which applications are co-running and what their individual pressures are. The study results in a statistical learning-based method for predicting cross-core performance interference as well as predictor models using PCA linear regression, which can quantitatively and precisely predict performance degradation caused by memory subsystem contention in any applications. Experimental results show that the average prediction error of the proposed method is 1.1%.
LIU Lu , PENG Tao , ZUO Wan-Li , DAI Yao-Kang
2013, 24(11):2571-2583. DOI: 10.3724/SP.J.1001.2013.04467 CSTR:
Abstract:Text classification is a key technology in information retrieval. Collecting more reliable negative examples, and building effective and efficient classifiers are two important problems for automatic text classification. However, the existing methods mostly collect a small number of reliable negative examples, keeping the classifiers from reaching high accuracy. In this paper, a clustering-based method for automatic PU (positive and unlabeled) text classification enhanced by SVM active learning is proposed. In contrast to traditional methods, this approach is based on the clustering technique which employs the characteristic that positive and negative examples should share as few words as possible. It finds more reliable negative examples by removing as many probable positive examples from unlabeled set as possible. In the process of building classifier, a term weighting scheme TFIPNDF (term frequency inverse positive-negative document frequency, improved TFIDF) is adopted. An additional improved Rocchio, in conjunction with SVMs active learning, significantly improves the performance of classifying. Experimental results on three different datasets (RCV1, Reuters-21578, 20 Newsgroups) show that the proposed clustering- based method extracts more reliable negative examples than the baseline algorithms with very low error rates and implementing SVM active learning also improves the accuracy of classification significantly.
CAO Ying , MIAO Qi-Guang , LIU Jia-Chen , GAO Lin
2013, 24(11):2584-2596. DOI: 10.3724/SP.J.1001.2013.04485 CSTR:
Abstract:AdaBoost is a meta ensemble learning algorithm. The most important theoretical property behind it is "Boosting", which also plays an important role in cost sensitive learning. However, available cost sensitive Boosting algorithms, such as AdaCost, AdaC1, AdaC2, AdaC3, CSB0, CSB1 and CSB2, are just heuristic. They add cost parameters into voting weight calculation formula or sample weights updating strategy of AdaBoost, so that the algorithms are forced to focus on samples with higher misclassification costs. However, these heuristic modifications have no theoretical foundations. The worst thing is that they break the most important theoretical property of AdaBoost, namely “Boosting”. Compared to AdaBoost which converges to optimal Bayes decision rule, those cost sensitive algorithms do not converge to cost sensitive decision rule. This paper studies the problem of designing cost sensitive Boosting algorithms strictly under Boosting theory. First, two new loss functions are constructed by making exponential loss and logit loss cost sensitive. It can be proved that the new loss functions are Fisher consistent in cost sensitive setting, therefore optimizing them finally leads to cost sensitive Bayes decision rule. Performing gradient decent in functional space to optimize these two loss functions then results in new cost sensitive Boosting algorithms: AsyB and AsyBL. Experimental results on synthetic Gaussian data prove that in comparison with other cost sensitive Boosting algorithms, AsyB and AsyBL always better approximate cost sensitive Bayes decision rule. Experimental results on UCI datasets further prove that AsyB and AsyBL generate better cost sensitive classifiers with lower misclassification costs and the misclassification costs decrease exponentially with iterations.
2013, 24(11):2597-2609. DOI: 10.3724/SP.J.1001.2013.04473 CSTR:
Abstract:Conventional dimensionality reduction algorithms aim to attain low recognition errors, assuming the same misclassification loss from different misclassifications. In some real-world applications, however, this assumption may not hold. For example, in the doorlocker syetem based on face recognition, there are impostor and gallery person. The loss of misclassifying an impostor as a gallery person is larger than misclassifying a gallery person as an impostor, while the loss of misclassifying a gallery person as an impostor can be larger than misclassifying a gallery person as other gallery persons. This paper recognizes the door-locker system based on face recognition as a cost-sensitive subclass learning problem, incorporates the subclass information and misclassification costs into the framework of discriminant analysis at the same time, and proposes a dimensionality reduction algorithm approximate to the pairwise Bayes risk. The experimental results on face datasets Extended Yale B and ORL demonstrate the superiority of the proposed algorithm.
ZHU Lin , LEI Jing-Sheng , BI Zhong-Qin , YANG Jie
2013, 24(11):2610-2627. DOI: 10.3724/SP.J.1001.2013.04469 CSTR:
Abstract:A key challenge to most conventional clustering algorithms in handling many real life problems is that data points in different clusters are often correlated with different subsets of features. To address this problem, subspace clustering has attracted increasing attention in recent years. However, the existing subspace clustering methods cannot be effectively applied to large-scale high dimensional data and data streams. In this study, the scalable clustering technique to subspace clustering is extend to form soft subspace clustering for streaming data. An entropy-weighting streaming subspace clustering algorithm, EWSSC is proposed. This method leverages on the effectiveness of fuzzy scalable clustering method for streaming data by revealing the important local subspace characteristics of high dimensional data. Substantial experimental results on both artificial and real-world datasets demonstrate that EWSSC is generally effective in clustering high dimensional streaming data.
2013, 24(11):2628-2641. DOI: 10.3724/SP.J.1001.2013.04470 CSTR:
Abstract:While categorical data are widely used in many applications such as Bioinformatics, clustering categorical data is a difficult task in the filed of statistical machine learning due to the characteristic of the data which can only take discrete values. Typically, the mainstream methods are dependent on the mode of the categorical attributes in order to optimize the clusters and weight the relevant attributes. A non-mode approach is proposed for statistically clustering of categorical data in this paper. First, based on a newly defined dissimilarity measure, an objective function with attributes weighting is derived for categorical data clustering. The objective function is defined based on the average distance between the objects and the clusters, therefore overcomes the problems in the existing methods based on the mode category. Then, a soft-subspace clustering algorithm is proposed for clustering categorical data. In this algorithm, each attribute is assigned with weights measuring its degree of relevance to the clusters in terms of the overall distribution of categories instead of the mode category, enabling automatic feature selection during the clustering process. Experimental results carried out on some synthetic datasets and real-world datasets demonstrate that the proposed method significantly improves clustering quality.
ZOU Peng-Cheng , WANG Jian-Dong , YANG Guo-Qing , ZHANG Xia , WANG Li-Na
2013, 24(11):2642-2655. DOI: 10.3724/SP.J.1001.2013.04464 CSTR:
Abstract:An effective distance metric is essential for time series clustering. To improve the performance of time series clustering, various methods of metric learning can be applied to generate a proper distance metric from the data. However, the existing metric learning methods overlook the characteristics of time series. And for time series, it is difficult to obtain side information, such as pairwise constraints, for metric learning. In this paper, a method for distance metric learning based on side information autogeneration for time series (SIADML) is proposed. In this method, dynamic time warping (DTW) distance is used to measure the similarity between two time series and generate pairwise constraints automatically. The metric which is learned from the pairwise constraints can preserve the neighbor relationship of time series as much as possible. Experimental results on benchmark datasets demonstrate that the proposed method can effectively improve the performance for time series clustering.
2013, 24(11):2656-2666. DOI: 10.3724/SP.J.1001.2013.04465 CSTR:
Abstract:Manifold learning based on spectral method has been widely used recently for discovering a low-dimensional representation in the high-dimensional vector space. Isospectral manifold learning is one of the main contents of spectrum method. Isospectral manifold learning stems from the conclusions that if only the spectrums of manifold are the same, so are their internal structures. However, the difficult task about the calculation of the spectrum is how to select the optimal neighborhood size and construct reasonable neighboring weights. In this paper, a supervised technique called isospectral manifold learning algorithm (IMLA) is proposed. By modifying directly sparse reconstruction weight, IMLA takes into account the within-neighboring information and between-neighboring information. Thus, it not only preserves the sparse reconstructive relationship, but also sufficiently utilizes discriminant information. Compared with PCA and other algorithms, IMLA has obvious advantages. Experimental results on face databases (Yale, ORL and Extended Yale B) show the effectiveness of the IMLA method.
QIAN Yu , YU Yang , ZHOU Zhi-Hua
2013, 24(11):2667-2675. DOI: 10.3724/SP.J.1001.2013.04471 CSTR:
Abstract:Reinforcement learning (RL) deals with long-term reward maximization problems via learning correct short-term decisions from on previous experience. It has been revealed that reward shaping, which provides simpler and easier reward functions to replace the actual environmental reward, is an effective way to guide and accelerate reinforcement learning. However, building a shaping reward requires either domain knowledge or demonstrations from an optimal policy, both involve participation of human experts that is costly. This work investigates whether it is possible to automatically learn a better shaping reward along with an RL process. RL algorithms commonly sample a lot of trajectories throughout the learning process. Those passive samples, though containing many failed attempts, may provide useful information for building a shaping reward function. A policy-invariance condition for reward shaping is introduced as a more effective way to handle noisy examples, followed by the RFPotential approach to learn a shaping reward from massive examples efficiently. Empirical studies on various RL algorithms and domains show that RFPotential can accelerate the RL process.
FU Qi-Ming , LIU Quan , FU Yu-Chen , ZHOU Yi-Cheng , YU Jun
2013, 24(11):2676-2686. DOI: 10.3724/SP.J.1001.2013.04466 CSTR:
Abstract:In machine learning with large or continuous state space, it is a hot topic to combine the function approximation and reinforcement learning. The study also faces a very difficult problem of how to balance the exploration and exploitation in reinforcement learning. In allusion to the exploration and exploitation dilemma in the large or continuous state space, this paper presents a novel policy iteration algorithm based on Gaussian process in deterministic environment. The algorithm uses Gaussian process to model the action-value function, and in conjunction with generative model, obtains the posteriori distribution of the parameter vector of the action-value function by Bayesian inference. During the learning process, it computes the value of perfect information according to the posteriori distribution, and then selects the appropriate action with respect to the expected value of the action-value function. The algorithm achieves the balance between exploration and exploitation to certain extent, and therefore accelerates the convergence. The experimental results on the Mountain Car problem show that the algorithm has faster convergence rate and better convergence performance.
WANG Li-Jin , YIN Yi-Long , ZHONG Yi-Wen
2013, 24(11):2687-2698. DOI: 10.3724/SP.J.1001.2013.04476 CSTR:
Abstract:Cuckoo search (CS) is a new nature-inspired intelligent algorithm which uses the whole update and evaluation strategy on solutions. For solving multi-dimension function optimization problems, this strategy may deteriorate the convergence speed and the quality of solution of algorithm due to interference phenomena among dimensions. To overcome this shortage, a dimension by dimension improvement based cuckoo search algorithm is proposed. In the progress of iteration of improved algorithm, a dimension by dimension based update and evaluation strategy on solutions is used. The proposed strategy combines an updated value of one dimension with values of other dimensions into a new solution, and greedily accepts any updated values that can improve the solution. The simulation experiments show that the proposed strategy can improve the convergence speed and the quality of the solutions effectively. Meanwhile, the results also reveal the proposed algorithm is competitive for continuous function optimization problems compared with other improved cuckoo search algorithms and other evolution algorithms.
CHAI Bian-Fang , YU Jian , JIA Cai-Yan , WANG Jing-Hong
2013, 24(11):2699-2709. DOI: 10.3724/SP.J.1001.2013.04474 CSTR:
Abstract:A stochastic block model can produce a wide variety of networks with different structures (named as general community, including traditional community, bipartite structure, hierarchical structure and etc); it also can detect general community in networks according to the rules of stochastic equivalence. However, the simple stochastic block model has some problems in modeling the generation of the networks and learning the models, showing poor results in fitting the practical networks. The GSB (general stochastic block) model is an extension of the stochastic block model, which is based on the idea of link community and is provided to detect general communities. But its complexity limits its applications in medium and large networks. In order to explore the latent structures of networks with different scales without prior knowledge about networks, a fast algorithm on the GSB model (FGSB) is designed to explore general communities in networks faster. FGSB dynamically learns the parameters related to the network structure in the process of iterations. It reduces the storage memory by reorganizing parameters to cut down unnecessary parameters, and saves the running time by pruning the related parameters of converging nodes and edges to decrease the computing time of each iteration. FGSB has the same ability of structure detection as the GSB model, but its complexities of time and storage are lower. Tests on synthetic benchmarks and real-world networks have demonstrated that FGSB not only can run faster than the algorithm of the GSB model in the similar accuracy, but also can detect general communities for large networks.
HU Yun , WANG Chong-Jun , XIE Jun-Yuan , WU Jun , ZHOU Zuo-Jian
2013, 24(11):2710-2720. DOI: 10.3724/SP.J.1001.2013.04477 CSTR:
Abstract:Community evolutionary pattern analysis in temporal datasets is a key issue in social network dynamics research and applications. Identifying outlying objects against main community evolution trends is not only meaningful by itself for the purpose of finding novel evolution behaviors, but also helpful for better understanding the mainstream of community evolution. Upon giving the belonging matrix of community members, this study defines a type of transition matrix to characterize the pattern of the evolutionary dynamic between two consecutive belonging snapshots. A set of properties about the transition matrix is discussed, which reveals its close relation to the gradual community structural change in quantity. The transition matrix is further optimized using M-estimator Robust Regression methods by minimizing the disturbance incurred by the outliers, and the abnormality of the outlier objects can then be computed at the same time. Considering that large proportion of trivial but nomadic objects may exist in large datasets like those of complex social networks, focus is placed only on the community evolutionary outliers that show remarkable difference from the main bodies of their communities and sharp change of their membership role within the communities. A definition on such type of local and global outliers is given, and an algorithm on detection such kind of outliers is proposed in this paper. Experimental results on both synthetic and real datasets show that the proposed approach is highly effective in discovering interesting evolutionary community outliers.
SUN Guang-Fu , WU Le , LIU Qi , ZHU Chen , CHEN En-Hong
2013, 24(11):2721-2733. DOI: 10.3724/SP.J.1001.2013.04478 CSTR:
Abstract:Collaborative filtering, which makes personalized predictions by learning the historical behaviors of users, is widely used in recommender systems. The key to enhance the performance of collaborative filtering is to precisely learn the interests of the active users by exploiting the relationships among users and items. Though various works have targeted on this goal, few have noticed the sequential correlations among users and items. In this paper, a method is proposed to capture the sequential behaviors of users and items, which can help find the set of neighbors that are most influential to the given users (items). Furthermore, those influential neighbors are successfully applied into the recommendation process based on probabilistic matrix factorization. The extensive experiments on two real-world data sets demonstrate that the proposed SequentialMF algorithm can achieve more accurate rating predictions than the conventional methods using either social relations or tagging information.
MAO Cun-Li , YU Zheng-Tao , WU Ze-Jian , GUO Jian-Yi , XIAN Yan-Tuan
2013, 24(11):2734-2746. DOI: 10.3724/SP.J.1001.2013.04480 CSTR:
Abstract:Expert evidence document recognition is the key step for expert search. Combining specialist candidate document independent page features and correlation among pages, this paper proposes an expert evidence document recognition method based on undirected graph model. First, independent page features such as words, URL links and expert metadata in all kinds of expert evidence document, and correlations such as links and content among candidate expert evidence document are analyzed. Then, independent page features and correlation among pages are integrated into the undirected graph to construct an undirected graph model for expert evidence document recognition. Finally, feature weights are learned in the model by using the gradient descent method and expert evidence document recognition is achieved by utilizing Gibbs Sampling method. The effectiveness of the proposed method is verified by comparison experiment. The experimental results show that the proposed method has a better effect.
ZHANG Shu , CAI Yong , XIE Mei
2013, 24(11):2747-2757. DOI: 10.3724/SP.J.1001.2013.04484 CSTR:
Abstract:In this paper, a face detection method based on local region sparse coding is proposed. First, every local face regions are extracted as training sample. Next, a discriminative dictionary whose atoms have explicit relations with local regions is learned. Then the appearance of a particular local region is determined based on the response of its sparse coding for each detection window. Finally, face location is obtained using position constraints and detection results of local regions. The innovation of the proposed method lies in combining sparse coding and part based model for face detection. Experimental results in Caltech and BioID database show that the proposed method is suitable for small sample size problem and has good detection results in case of occlusion, rotation, complex expressions.
HAN Bing , YANG Chen , GAO Xin-Bo
2013, 24(11):2758-2766. DOI: 10.3724/SP.J.1001.2013.04481 CSTR:
Abstract:There are different shapes of auroras in the sky around the arctic pole and the antarctic pole and there are different physical meaning and significance for different auroras. Therefore, the research on classification of aurora images has significant scientific value. In this paper, an aurora image classification method based on LDA with saliency information (SI-LDA) is proposed. First, the salience information of aurora images is used to generate visual dictionary which enhances the semantic information of aurora images. Next, the aurora images are represented by SI-LDA. Finally, SVM is applied to classify aurora images. Experimental results show that the proposed method achieves high performance over other algorithms available.