ZHU Xiao-Hui , TAO Qing , SHAO Yan-Jian , CHU De-Jun
2015, 26(11):2752-2761. DOI: 10.13328/j.cnki.jos.004890 CSTR:
Abstract:Stochastic optimization is one of the efficient methods for solving large-scale machine learning problems. In stochastic learning, the full gradient is replaced by the gradient of loss function in terms of a randomly selected single sample to avoid high computational cost. However, large variance is usually caused. Recent studies have shown that the convergence rate of stochastic gradient methods can be effectively improved by reducing the variance. In this paper, the variance reduction issue in COMID (composite objective mirror descent) is addressed when solving non-smooth optimization problems. First a proof is provided to show that the COMID has a convergence rate O(1/√T+σ2/√T) in the terms of variance, where T is the iteration number and σ2 is the variance. This convergence rate ensures the effectiveness of reducing the variance. Then, a stochastic optimization algorithm α-MDVR (mirror descent with variance reduction) is obtained by incorporating the strategy of reducing variance into COMID. Unlike Prox-SVRG (proximal stochastic variance reduced gradient), α-MDVR does not depend on the number of samples and only uses a small portion of samples to modify the gradient. The comparative experiments demonstrate that α-MDVR not only reduces the variance but also decreases the computational time.
YANG Liu , JING Li-Ping , YU Jian
2015, 26(11):2762-2780. DOI: 10.13328/j.cnki.jos.004892 CSTR:
Abstract:The lack of labeled data affects the performance in target domain. Fortunately, there are ample labeled data in some other related source domains. Transfer learning allows knowledge to be transferred from source domains to target domain. In real applications, such as text-image and cross-language transfer learning, the feature spaces of source and target domains are different, that is heterogeneous transfer learning. This paper focuses on heterogeneous transductive transfer learning (HTTL), an approach to improve the performance of unlabeled data in target domain by using some labeled data in heterogeneous source domains. Since the feature spaces of source domains and target domain are different, the key problem is to learn the mapping functions between the heterogeneous source domains and target domain. This paper proposes to learn the mapping functions by unsupervised matching in the different feature spaces. The data in source domains can be re-represented with the mapping functions and transferred to the target domain. Thus, in target domain, there are some labeled data which come from the source domains. Standard machine learning methods such as support vector machine can be used to train classifiers for predicting the labels of unlabeled data in target domain. Moreover, a probabilistic interpretation is derived to verify the robustness of the presented method over certain noises in the utility matrices. A sample complexity bound is given to indicate how many instances are needed to adequately find the mapping functions. The effectiveness of the proposed approach is verified by experiments on four real-world data sets.
2015, 26(11):2781-2794. DOI: 10.13328/j.cnki.jos.004901 CSTR:
Abstract:The present paper focuses on the problem of the connected road intersection of multiply connected Lie group covering learning which was recently shown to possess a cover learning based on the connectivity of Lie group on the author's previous studies. It discusses a geodesic curve for optimal mapping of roads to minimize the correlation of roads from different connected spaces and maximize the correlation of roads within the same connected space. A review on some relevant notions from Lie-group connectivity theory is provided, followed by a brief introduction of multiply connected covering learning algorithm. New path optimization algorithms are then proposed. Some numerical experiments compared with classical covering learning methods, Lie group means learning algorithms and the author's previous algorithm serve to illustrate the better classification performance of the presented optimization algorithms.
ZHANG Ming-Wei , ZHU Zhi-Liang , Liu Ying , ZHANG Bin
2015, 26(11):2795-2810. DOI: 10.13328/j.cnki.jos.004897 CSTR:
Abstract:For many practical classification applications, the class label of new data needs to be confirmed eventually by domain expert, and the result of classifier only plays an assistant role. In addition, with the implicit values of big data calling more people's attention, classifier training is going through a transition from single dataset to distributed space dataset, and assistant classification in big data environment will also become an important branch of future classification applications. However, existing classification research lacks attention to this kind of application. Assistant classification in big data environment faces with the following three problems: 1) the training set is distributed big dataset, 2) in space, the class distributions of local datasets contained in the training set are commonly different, and 3) in time, the training set is dynamic and its class distribution may change. To address the above problems, this paper proposes a distributed assistant associative classification approach in big data environment. Firstly, a distributed associative classifier constructing algorithm in big data environment is constructed. With the new algorithm, the class distribution difference in space of the classification dataset is considered by horizontal weighting, and the performance deficiency of associative classification algorithms to imbalanced class distribution datasets is improved by giving a measure framework of "antecedent space support-correlation coefficient". Next, an adaptive factor based dynamic adjustment method for assistant associative classifier is proposed. This method can make full use of domain experts' real-time feedback to adjust classifier dynamically in the applying process of the used classifier, to improve its performance facing dynamic datasets, and to slow down its retraining frequency. Experimental results demonstrate that the presented approach can relative quickly train associative classifiers with higher classification accuracy for distributed datasets, and can improve their performance when datasets are continually expanding and changing. Thus it's an effective approach for assistant classification applications in big data environment.
2015, 26(11):2811-2819. DOI: 10.13328/j.cnki.jos.004908 CSTR:
Abstract:Exploiting label relationship to help improve learning performance is important for multi-label learning. The classifier chain method and its variants are shown to be a powerful solution to such a problem. However, its learning process requires the ordering of labels, which is hard to obtain in real-world situations, and incorrect label ordering may cause a suboptimal performance. To overcome the drawback, this paper presents a classifier circle method for multi-label learning. It initializes the label ordering randomly, and then subsequently and iteratively updates the classifier for each label by connecting the labels as a circle. Experimental results on a number of data sets show that the proposal outperforms classifier chains method as well as many state-of-the-art multi-label methods.
LIU Bei-Bei , MA Ru-Ning , DING Jun-Di
2015, 26(11):2820-2835. DOI: 10.13328/j.cnki.jos.004902 CSTR:
Abstract:To tackle the failure of traditional clustering algorithms in dealing with large-scale data, the paper proposes a density-based statistical merging algorithm for large data sets (DSML). The algorithm takes each feature of data points as a set of independent random variable, and gets statistical merger criteria from the independent bounded difference inequality. To begin with, DSML improves Leaders algorithm by using the statistical merger criteria, and makes the improved algorithm as the sampling algorithm to obtain representative points. Secondly, combined with the density and the neighborhood information of representative points, the algorithm uses statistical merger criteria again to complete the clustering of the whole data set. Theoretical analysis and experimental results show that, DSML algorithm has nearly linear time complexity, can handle arbitrary data sets, and is insensitive to noise data. This fully proves the validity of DSML algorithm for large data sets.
JIA Hong-Jie , DING Shi-Fei , SHI Zhong-Zhi
2015, 26(11):2836-2846. DOI: 10.13328/j.cnki.jos.004888 CSTR:
Abstract:Spectral clustering is based on algebraic graph theory. It turns the clustering problem into the graph partitioning problem. To solve the graph cut objective function, the properties of the Rayleigh quotient are usually utilized to map the original data points into a lower dimensional eigen-space by calculating the eigenvectors of Laplacian matrix and then conducting the clustering in the new space. However, during the process of spectral clustering, the space complexity of storing similarity matrix is O(n2), and the time complexity of the eigen-decomposition of Laplacian matrix is usually O(n3). Such complexity is unacceptable when dealing with large-scale data sets. It can be proved that both normalized cut graph clustering and weighted kernel k-means are equivalent to the matrix trace maximization problem, which suggests that weighted kernel k-means algorithm can be used to optimize the objective function of normalized cut without the eigen-decomposition of Laplacian matrix. Nonetheless, weighted kernel k-means algorithm needs to calculate the kernel matrix, and its space complexity is still O(n2). To address this challenge, this study proposes an approximate weighted kernel k-means algorithm in which only part of the kernel matrix is used to solve big data spectral clustering problem. Theoretical analysis and experimental comparison show that approximate weighted kernel k-means has similar clustering performance with weighted kernel k-means algorithm, but its time and space complexity is greatly reduced.
ZHOU Guo-Bing , WU Jian-Xin , ZHOU Song
2015, 26(11):2847-2855. DOI: 10.13328/j.cnki.jos.004895 CSTR:
Abstract:With the rapid expansion of information, scale and dimensionality of data are constantly increasing. Traditional clustering methods are difficult to adapt to this trend. Especially, given the fast development of mobile computing platforms, its properties limit the scale of memory that algorithms can use, so many algorithms cannot run on such platforms without making improvements. This paper proposes a clustering method based on nearest neighbor representation. This method uses the idea of nearest neighbors to construct the new representation. This new representation is compressible, thus effectively reducing the storage cost required for clustering. An algorithm called Bit k-means in implemented to perform clustering directly on the compressed nearest neighbors representation. Experimental results show that the new method achieves higher accuracy and substantially reduces the storage cost.
2015, 26(11):2856-2868. DOI: 10.13328/j.cnki.jos.004905 CSTR:
Abstract:Kernel method is a common machine learning algorithm used in classification, clustering, regression and feature selection. Kernel selection and kernel parameter optimization are the crucial problems which impact the effectiveness of kernel method, and therefore motive the research on kernel evaluation measure, especially universal kernel evaluation measure. Five widely used universal kernel evaluation measures, including KTA, EKTA, CKTA, FSM and KCSM, are analyzed and compared. It is found that the evaluation object of five universal kernel evaluation measures mentioned above is average margin of a linear hypothesis in feature space, which has bias against the SVM optimization criterion to maximize minimum margin. Then, this study applies synthetic data to analyze the class distribution sensitivity, linear translation sensitivity, and heteroscedastic data sensitivity. It also concludes that the measures mentioned above are only the unnecessary and sufficient condition of kernel evaluation, and good kernel can achieve low evaluation value. Finally, comparing the evaluation result of the measures mentioned above on 9 UCI data sets and 20 Newsgroups data set suggests that CKTA is the best universal kernel evaluation measure.
QIAO Shao-Jie , LI Tian-Rui , HAN Nan , GAO Yun-Jun , YUAN Chang-An , WANG Xiao-Teng , TANG Chang-Jie
2015, 26(11):2869-2883. DOI: 10.13328/j.cnki.jos.004889 CSTR:
Abstract:The existing trajectory prediction algorithms focus on the mobility pattern of objects and simulate the traffic flow via mathematical models which are inaccurate at describing network-constraint objects. In order to cope with this problem, a self-adaptive parameter selection trajectory prediction model based on hidden Markov models (SATP) is proposed. The new model can efficiently cluster and partition location big data, and extract the hidden and observable states by using a density-based clustering approach in order to reduce the number of states in HMM. SATP can automatically select the parameters on the input trajectories and avoid the problems of discontinuous hidden states and state retention. Experimental results demonstrate that the SATP model has high prediction accuracy with less time overhead. The average prediction accuracy of SATP is 84.1% while the moving objects have a random changing speed, which is higher than the Na?ve algorithm with an average gap of 46.7%.
CAI Rui-Chu , XIE Wei-Hao , HAO Zhi-Feng , WANG Li-Juan , WEN Wen
2015, 26(11):2884-2896. DOI: 10.13328/j.cnki.jos.004893 CSTR:
Abstract:Because of the great variations of crowd density and crowd dynamics, as well as the existence of many shelters in scenes, the abnormal crowd event detection and localization are still challenging problems and hot topics of the crowd scene analysis. Based on the spatial-temporal modeling of the crowd scene, this paper proposes an abnormal crowd event detection and localization approach based on multi-scale recurrent neural network. Firstly, the crowd scenes are split into grids and presented using multi-scale histogram of optical flow (MHOF). Then, different grids are connected to obtain a global time series model of the crowd scene. Finally, a multi-scale recurrent neural network is devised to detect and locate the abnormal event on the time series model of the crowd scene. In the multi-scale recurrent neural network, the multi-scale hidden layers are used to model the spatial relation among different scale neighbors, and the feedback loops are used to catch the temporal relation. Extensive experiments demonstrate the effectiveness of the presented approach.
YU Qian , GAO Yang , HUO Jing , ZHUANG Yun-Kai
2015, 26(11):2897-2911. DOI: 10.13328/j.cnki.jos.004894 CSTR:
Abstract:In this paper, video-based face recognition (VFR) is converted into image set recognition. Two types of manifolds are proposed to represent each gallery set: one is inter-class manifold which represents mean face information of this set, and the other is intra- class manifold corresponding to original images information of this set. The inter-class manifold abstracts discriminative information of the whole image set so as to select a few candidate gallery sets relevant to query set. The intra-class manifold chooses the most similar one from candidate sets by considering the relationships among all original images of each gallery set. Existing nonlinear manifolds methods project each image into low dimensional space as a point, thus suffer from cluster alignment and un-sufficient sampling. In order to avoid the above flaws and make the margin clearer between manifolds, projecting matrices in new method are gotten by means of partitioning image into un-overlapping patches so that features extracted this way can be more discriminative. In addition, a method of similarity measure between manifolds is proposed in accordance with the patching scheme. Finally, extensive experiments are conducted on several widely studied databases. The results demonstrate that new method achieves better performance than those state-of-the-art VFR methods, and it especially works well in un-controlled videos without being affected by the length of video.
LI Zhao-Kui , DING Li-Xin , WANG Yan , HE Jin-Rong , DING Guo-Hui
2015, 26(11):2912-2929. DOI: 10.13328/j.cnki.jos.004896 CSTR:
Abstract:A face feature representation method based on difference local directional pattern (DLDP) is proposed. Firstly, each pixel of every facial image sub-block gains eight edge response values by convolving the local neighborhood with eight Kirsch masks. Then, the difference of each pair of neighboring edge response values is calculated to form eight new difference directions. The top k difference response values are selected and the corresponding directional bits are set to 1. The remaining (8-k) bits are set to 0, thus forming the binary expression of a difference local direction pattern. In addition, high-resolution Kirsch masks only consider directions but ignore the weight values of each pixel location. DLDP proposes a design method of weight values. Finally, the sub-histogram is calculated by accumulating the number of different DLDP of image blocks. All sub-histograms of an image are concatenated into a new face descriptor. Experimental results show that DLDP achieves state-of-the-art performance for difficult problems such as expression, illumination and occlusion-robust face recognition in most cases. Especially, DLDP gets better results for occlusion problem.
XIE Bo-Jun , ZHU Jie , YU Jian
2015, 26(11):2930-2938. DOI: 10.13328/j.cnki.jos.004898 CSTR:
Abstract:Designing patch-level features is essential for achieving good performance in computer vision tasks, such as image classification and object recognition. SIFT (scale-invariant feature transform) and HOG (histogram of oriented gradient) are the typical schemes among many pre-defined patch-level descriptors, but the difference between artificial patch-level features is not good enough for reflecting the similarities of images. Kernel descriptor (KD) method offers a new way to generate features from match kernel defined over image patch pairs using KPCA (kernel principal component analysis) and yields impressive results. However, all joint basis vectors are involved in the kernel descriptor computation, which is both expensive and not necessary. To address this problem, this paper presents an efficient patch-level descriptor (EPLd) which is built upon incomplete Cholesky decomposition. EPLd automatically selects a small number of image patches pivots to achieve better computational efficiency. Based on EPLd, MMD (maximum mean discrepancy) distance is used for computing similarities between images. In experiments, the EPLd approach achieves competitive results on several image/scene classification datasets compared with state-of-the-art methods.
TANG Chao , WANG Wen-Jian , LI Wei , LI Guo-Bin , CAO Feng
2015, 26(11):2939-2950. DOI: 10.13328/j.cnki.jos.004899 CSTR:
Abstract:Human action recognition is a hot topic in computer vision. Most of the existing work use the action models based on supervised learning algorithms. To achieve good performance on recognition, a large amount of labeled samples are required to train the sophisticated action models. However, collecting labeled samples is labor-intensive. This paper presents a novel semi-supervised learning algorithm named multi-learner co-training model (MCM) to recognize human actions. Two key issues are addressed in this paper. Firstly, the base classifiers in co-training are selected by Q statistic-based classifiers selection algorithm (QSCSA). Secondly, MCM is employed for the semi-supervised model construction. The new confidence score measure of unlabeled sample depends on estimating the classifier companion committee (CCC) accuracy on labeling the neighborhood of an unlabeled examples. To evaluate the proposed algorithm, mixed-descriptors are used to express actions so that the recognition algorithm can quickly complete the recognition process from a single frame of visual image. Experimental results are presented to show that the proposed semi-supervised learning system can recognize simple human actions effectively.
WANG Kai , YU Wei , YANG Sha , WU Min , HU Ya-Hui , LI Shi-Jun
2015, 26(11):2951-2963. DOI: 10.13328/j.cnki.jos.004907 CSTR:
Abstract:As a high-quality source in social media big data, the geographic location has been widely adopted in the fields of disease control, population mobility analysis and ad delivery positioning with the rapid development of online social media and the prevalence of localizable mobile devices. However, the location data are quite sparse because often the locations cannot be accurately specified by the users. To overcome this data sparsity problem, this paper proposes UGC-LI, a user generate content driven location inference method to infer the location where users and social texts are created. The method can provide supporting data for location-based personalized information services. A probability model is constructed by comprehensive considering the distribution of location words and social graph of users via local words extracted from user generated texts to locate the users in multi-granularity. Further, a parameterized linguistic model based on location is presented to calculate the city where the tweet is published. The results of experiment on real-word dataset demonstrate that this new method outperforms existing algorithms. In the experiment, 64.2% of users are identified within 15km displacement distance, 81.3% of the living cities and 32.7% of the cities where the tweets were tweeted are correctly located.
ZHANG Li-Xia , WANG Wei-Ping , GAO Jian-Liang , WANG Jian-Xin
2015, 26(11):2964-2980. DOI: 10.13328/j.cnki.jos.004891 CSTR:
Abstract:In the age of big data, with the scales of data graphs growing rapidly, incremental graph pattern matching algorithm has become the research focus since it can avoid re-computing matches in the whole data graph and reduce the response time when the data graph or the pattern graph update. Considering the scenario where the data graph is constant while the pattern graph is changing in practical application, a pattern graph change oriented incremental graph pattern matching algorithm named PGC_IncGPM is proposed. The appropriate intermediate results generated in the process of graph pattern matching are recorded as indexes for subsequent pattern matching. An enhanced graph pattern matching algorithm named GPMS is presented for the first time for graph matching in the whole data graph. On one hand, the algorithm can build indexes for the subsequent incremental matching; on the other hand, it reduces the execution time of matching in the data graph. Two core subalgorithms designated to adding and reducing edges in the patter graph are designed and realized. No matter what mode changes in the pattern graph, incremental graph pattern matching can be carried out by combining these two subalgorithms. Experiments on real datasets and synthetic data show that PGC_IncGPM can effectively reduce the graph pattern matching execution time. Compared with the ReComputing algorithm which re-computes matches starting from scratch in the whole data graph, if the number of changed edges does not exceed the number of unchanged edges, the proposed algorithm can reduce execution time effectively. With the scale of the data graph increases, the reduction in execution time of PGC_IncGPM algorithm is more obvious, demonstrating the algorithm's applicability for large-scale data graph.
LIU Hai-Yang , WANG Zhi-Hai , HUANG Dan , SUN Yan-Ge
2015, 26(11):2981-2993. DOI: 10.13328/j.cnki.jos.004904 CSTR:
Abstract:Collaborative filtering (CF) is the core of most of today's recommender systems. Conventional CF models focus on the accuracy of predicted ratings, while the actual output of recommender systems is a list of ranked items. In response to this problem, this research introduces technologies in the field of learning to rank into recommendation algorithms and proposes a listed collaborative ranking algorithm based on the assumption that the rating matrix is locally low-rank. It directly uses list-wise ranking loss function to optimize the matrix factorization model. Significant improvement on operation speed is achieved and verified by experiment. Experiments on three real-world recommender system datasets show that the proposed algorithm is a viable approach compared with existing recommendation algorithms.
YANG Hao , DUAN Lei , HU Bin , DENG Song , WANG Wen-Tao , QIN Pan
2015, 26(11):2994-3009. DOI: 10.13328/j.cnki.jos.004906 CSTR:
Abstract:Distinguishing sequential pattern can be used to present the difference between data sets, and thus has wide applications, such as commodity recommendation, user behavior analysis and power supply predication. Previous algorithms on mining distinguishing sequential patterns ask users to set both positive and negative support thresholds. Without sufficient prior knowledge of data sets, it is difficult for users to set the appropriate support thresholds, resulting in missing some significant contrast patterns. To deal with this problem, an algorithm, called kDSP-miner (top-k distinguishing sequential patterns with gap constraint miner), for mining top-k distinguishing sequential patterns satisfying the gap constraint is proposed. Instead of setting the contrast thresholds directly, a user-friendly parameter, which indicates the expected number of top distinguishing patterns to be discovered, is introduced in kDSP-miner. It makes kDSP-miner easy to use, and its mining result more comprehensible. In order to improve the efficiency of kDSP-miner, several pruning strategies and a heuristic strategy are designed. Furthermore, a multi-thread version of kDSP-miner is designed to enhance its applicability in dealing with the sequences with high dimensional set of elements. Experiments on real world data sets demonstrate that the proposed algorithm is effective and efficient.
GUO Ping , WANG Ke , LUO A-Li , XUE Ming-Zhi
2015, 26(11):3010-3025. DOI: 10.13328/j.cnki.jos.004900 CSTR:
Abstract:Big data and its real-world applications have attracted a lot of attention with the explosive growth of data volumes not only in the academic but also in industrial. Big data analysis aimed at mining the potential value of big data has become a popular research topic. Computational intelligence (CI) which is an important research direction of artificial intelligence and information science has been shown to be promising to solve complex problems in scientific research and engineering. CI techniques are expected to provide powerful tools for addressing challenges in big data analytics. This paper surveys the related CI techniques, analyzes the grand challenges brought forth by big data from big data analysis perspectives, and discusses the possible research directions in the future of the big data era. Further, it proposes to conduct the research of big data analysis on scientific big data such as astronomy big data.
ZHOU Qing , MOU Chao , YANG Dan
2015, 26(11):3026-3042. DOI: 10.13328/j.cnki.jos.004887 CSTR:
Abstract:Educational data mining (EDM) focuses on solving theoretical and practical problems in education by applying principles and techniques from educational science, computer science, psychology, and statistics. It is believed that EDM will become more mature and promising in the Age of Big Data. This paper aims to help readers to understand or engage EDM research. First, the basic concepts, characteristics and research history of EDM are introduced. Then some latest results of EDM are presented and analyzed. Most results were published in 2013 and later, including some studies on several educational techniques that were rarely investigated before. Those results are also analyzed via classification, statistics and comparison, and based on which strength and weakness of EDM is discussed. Finally, opportunities and challenges facing EDM are discussed.