ZHU Peng-Fei , ZHANG Wan-Ying , WANG Yu , HU Qing-Hua
2022, 33(4):1156-1169. DOI: 10.13328/j.cnki.jos.006468 CSTR:
Abstract:In recent years, deep neural networks have continuously achieved breakthroughs in the classification task, but they will mistakenly give a wrong known class prediction when faced with unknown samples in the testing phase. The open set recognition is a possible way to solve the problem, which requires the model not only to classify the known classes, but also to distinguish the unknown samples accurately. Most of the existing methods are designed heuristically based on certain assumptions. Despite keeping the performance increasing, they have not analyzed the key factors that affect the task. This study analyzes the commonalities of existing methods by designing a new decision variable experiment and find that the ability of model to learn representations of known classes is an important factor. Then an open set recognition method is proposed based on model representation learning ability enhancement. Firstly, due to the powerful representation learning capabilities demonstrated by the contrastive learning and the label information contained in the open set recognition task, supervised contrastive learning is introduced to improve the modeling ability of the model to known classes. Secondly, considering that the correlation among the categories is the representation learning at the category level, and the hierarchical structure relationship among the categories is often presented, a multi-granularity inter-class correlation loss is designed by building the hierarchical structure in the label semantic space and measuring the multi-granularity inter-class correlation. The multi-granularity inter-class correlation loss constrains the model to learn the correlation among different known classes to further improve the representation learning ability of model. Finally, experimental results on multiple standard datasets verify the effectiveness of the proposed method on open set recognition tasks.
WANG Yun-Yun , SUN Gu-Wei , ZHAO Guo-Xiang , XUE Hui
2022, 33(4):1170-1182. DOI: 10.13328/j.cnki.jos.006478 CSTR:
Abstract:Unsupervised domain adaptation (UDA) adopts source domain with large amounts of labeled data to help the learning of the target domain without any label information. In UDA, the source and target domains usually have different data distribution, but share the same label space. But in real open scenarios, label spaces between domains can also be different. In extreme cases, there is no shared class between domains, i.e., all classes in target domain are new classes. In this case, directly transferring the discriminant knowledge in source domain would harm the performance of target domain, lead to negative transfer. As a result, this study proposes an unsupervised new-set domain adaptation with self-supervised knowledge (SUNDA). Firstly, self-supervised learning is adopted to learn the initial features on source and target domains, with the first few layers frozen, in order to keep the target information. Then, the class contrastive knowledge from the source domain is transferred, to help learning discriminant features for target domain. Moreover, the graph-based self-supervised classification loss is adopted to handle the classification problem in target domain without common labels. Experiments are conducted over both digit and face recognition tasks without shared classes, and the empirical results show the competitive of SUNDA compared with UDA and unsupervised clustering methods, as well as new category discovery method.
WANG Fan , HAN Zhong-Yi , YIN Yi-Long
2022, 33(4):1183-1199. DOI: 10.13328/j.cnki.jos.006467 CSTR:
Abstract:Unsupervised domain adaptation is one of the effective ways to solve the inconsistent distribution of training set (source domain) and test set (target domain). Existing unsupervised domain adaptation theories and methods have achieved some success in relatively closed and static environments. However, for open dynamic task environments, the robustness of existing unsupervised domain adaptation methods will face serious challenges under the constraints of privacy protection and data silos, where source domain data are often not directly accessible. In view of this, this paper investigates a more challenging yet under-explored problem: source free unsupervised domain adaptation, with the goal of achieving positive transfer from the source domain to the target domain based only on the pre-trained source domain model and unlabeled target domain data. In this paper, we propose a method called PLUE-SFRDA (pseudo label uncertainty estimation for source free robust domain adaptation). The core idea of PLUE-SFRDA is to combine information entropy and energy function to fully explore the implicit information of the target domain data based on the prediction results of the source domain model, explore the class prototypes and class anchors to accurately estimate the pseudo label of the target domain data, and then tune the domain adaptation model to achieve the source free robust domain adaptation. PLUE-SFRDA contains a proposed binary soft constraint information entropy, which solves the problem that the standard information entropy cannot effectively estimate the pseudo label uncertainty of samples at the decision boundary, enhances the confidence of the mined class prototypes, and thus improves the accuracy of pseudo label estimation in the target domain. PLUE-SFRDA contains a weighted comparison filtering method proposed by this paper. By comparing the weighted distances of each sample to the class anchors of other classes, the fuzzy samples of class information at the decision boundary are filtered out, which further improves the security of the new pseudo label uncertainty estimation. PLUE-SFRDA also contains an information maximization loss to achieve iterative optimization of the source domain classifier and the pseudo label estimator, which gradually migrates the source domain knowledge embedded in the source domain model to the target domain, further improving the robustness of the pseudo label uncertainty estimation. Extensive experiments on three publicly available datasets, Office-31, Office-Home and VisDA-C, show that PLUE-SFRDA not only outperforms the state-of-the-art source-free domain adaptation methods but also significantly outperforms standard domain adaptation methods which depend on the source-domain data.
ZHANG Shi , LAI Hui-Xia , XIAO Ru-Liang , PAN Miao-Xin , ZHANG Lu-Lu , CHEN Wei-Lin
2022, 33(4):1200-1217. DOI: 10.13328/j.cnki.jos.006463 CSTR:
Abstract:The retrieval methods based-on locality-sensitive hashing (LSH) provide a feasible solution to the problem of approximate nearest neighbor (ANN) search on high-dimensional, multiple distributed characteristics, and massive data. However, there are still some unresolved problems in open environment, such as poor adaptability to the data with multiple distribution characteristics. Based on the fact that Laplacian operator is sensitive to sharp changes in data, an LSH retrieval method based on Laplacian operator (LPLSH) is proposed, which is suitable for data in open environment with a variety of distributed characteristics, and can segment data on global view. By applying Laplacian operator to the probability density distribution of data projection, the position of the sharp change of distribution will found as the offset of the hyperplane. This study proves theoretically that the reduced dimension can keep the local sensitivity characteristics of the hash function, and the global low projection density interval segmentation is helpful to improve the precision. The guiding significance of using Laplacian operator to obtain the second derivative to set the hyperplane offset is also analyzed. Compared with the other 8 methods based on LSH, the F1 value of LPLSH is 0.8-5 times of the optimal value of other methods, and it takes less time. Through the analysis of the distribution characteristics of experimental datasets, the experimental results show that LPLSH can take into account the efficiency, accuracy, and recall rate at the same time, can meet the robustness requirements of large-scale high- dimensional retrieval with multi-distribution characteristics in open environment.
CAO Liu-Juan , KUANG Hua-Feng , LIU Hong , WANG Yan , ZHANG Bao-Chang , HUANG Fei-Yue , WU Yong-Jian , JI Rong-Rong
2022, 33(4):1218-1230. DOI: 10.13328/j.cnki.jos.006477 CSTR:
Abstract:Recent studies have shown that adversarial training is an effective method to defend against adversarial example attacks. However, such robustness comes with a price of a larger generalization gap. To this end, existing endeavors mainly treat each training example independently, which ignores the geometry relationship between inter-samples and does not take the defending capability to the full potential. Different from existing works, this study focuses on improving the robustness of the neural network model by aligning the geometric information of inter-samples to make the feature spatial distribution structure between the natural and adversarial samples is consistent. Furthermore, a dual-label supervised method is proposed to leverage true and wrong labels of adversarial example to jointly supervise the adversarial learning process. The characteristics of the dual-label supervised learning method are analyzed and it is tried to explain the working mechanism of the adversarial example theoretically. The extensive experiments have been conducted on benchmark datasets, which well demonstrates that the proposed approach effectively improves the robustness of the model and still keeps the generalization accuracy. Code is available: https://github.com/SkyKuang/DGCAT.
LONG Sheng , TAO Wei , ZHANG Ze-Dong , TAO Qing
2022, 33(4):1231-1243. DOI: 10.13328/j.cnki.jos.006464 CSTR:
Abstract:Compared with the gradient descent, the adaptive gradient descent (AdaGrad) keeps the geometric information of historical data by using the arithmetic average of squared past gradients, and obtains tighter convergence bounds when copingwith sparse data. On the other hand, by adding the momentum term to gradient descent, Nesterov’s accelerated gradient (NAG) not only obtains order of magnitude accelerated convergence for solving smooth convex optimization problems, but also achieves the optimal individual convergence rate for non-smooth convex problems. Recently, there have been studies on the combination of adaptive strategy and NAG. However, as a typical adaptive NAG algorithm, AcceleGrad fails to reflect the distinctions between dimensions due to using different adaptive variant from AdaGrad, and it only obtains the weighted averaging convergence rate. So far, there still lacks the theoretical analysis of individual convergence rate. In this study, an adaptive NAG method, which inherits AdaGrad’s step size setting, is proposed. It is proved that the proposed algorithm attains the optimal individual convergence rate when solving the constrained non-smooth convex optimization problems. The experiments are conducted on the typical optimization problem of hinge loss function for classification and L1 loss function for regression with L1 norm constraint, and experimental results verify the correctness of the theoretical analysis and superior performance of the proposed algorithm over AcceleGrad.
XU Peng-Yu , LIU Hua-Feng , LIU Bing , JING Li-Ping , YU Jian
2022, 33(4):1244-1266. DOI: 10.13328/j.cnki.jos.006481 CSTR:
Abstract:With the explosive growth of Internet information, tags (keywords specified by users to describe the item) become more and more important in the field of Internet information retrieval. Giving appropriate tags to online content is conducive to more efficient content organization and content consumption. Tag recommendation greatly improves the quality of tags by assisting users to tag. Therefore, tag recommendation has been widely concerned by researchers. This study summarizes the three characteristics of tag recommendation task, that is, the diversity of item content, the correlation between tags, and the difference of user preferences. According to these three characteristics, tag recommendation methods are divided into three categories: content-based method, tag relevance based method, and user preference based method. After that, the corresponding methods are sorted out and analyzed under these three categories. Finally, the main challenges are presented in the field of tag recommendation, such as the long tail problem of tags, the dynamics of user preferences, and the fusion of multimodal information, and the future research is prospected as well.
2022, 33(4):1267-1273. DOI: 10.13328/j.cnki.jos.006475 CSTR:
Abstract:In multi-label learning (MLL) problems, each example is associated with a set of labels. In order to train a well-performed predictor for unseen examples, exploiting relations between labels is crucially important. Most exiting studies simplify the relation as correlations among labels, typically based on their co-occurrence. This study discloses that causal relations are more essential for describing how a label can help another one during the learning process. Based on this observation, two strategies are proposed to generate causal orders of labels from the label causal directed acyclic graph (DAG), following the constraint that the cause label should be prior to the effect label. The main idea of the first strategy is to sort a random order to make it satisfied the cause-effect relations in DAG. And the main idea of the second strategy is to put labels into many non-intersect topological levels based on the structure of the DAG, then sort these labels through their topological structure. Further, by incorporating the causal orders into the classifier chain (CC) model, an effective MLL approach is proposed to exploit the label relation from a more essential view. Experiments results on multiple datasets validate that the extracted causal order of labels indeed provides helpful information to boost the performance.
LI Shao-Yuan , WEI Meng-Long , HUANG Sheng-Jun
2022, 33(4):1274-1286. DOI: 10.13328/j.cnki.jos.006479 CSTR:
Abstract:Traditional supervised learning requires the ground truth labels for the training data, which can be difficult to collect in many cases. In contrast, crowdsourcing learning collects noisy annotations from multiple non-expert workers and infers the latent true labels through some aggregation approach. This study notices that existing deep crowdsourcing work do not sufficiently model worker correlations, which however is shown to be helpful for learning by previous non-deep learning approaches. A deep generative crowdsourcing learning model is proposed to combine the strength of deep neural networks (DNN) and at the same time exploit the worker correlations. The model comprises a DNN classifier as a priori for the true labels, and one annotation generation process in which a mixture model of workers’ reliabilities within each class is introduced for inter-worker correlation. To automatically trade-off between the model complexity and data fitting, fully Bayesian inference is developed. Based on the natural-gradient stochastic variational inference techniques developed for structured variational autoencoder (SVAE), variational message passing is combined for conjugate parameters and stochastic gradient descent for DNN under a unified framework to conduct efficient end-to-end optimization. Experimental results on 22 real world crowdsourcing data sets demonstrate the effectiveness of the proposed approach.
WEN Yi-Min , LIU Shuai , MIAO Yu-Qing , YI Xin-He , LIU Chang-Jie
2022, 33(4):1287-1314. DOI: 10.13328/j.cnki.jos.006476 CSTR:
Abstract:In the open environment, data streams have the characteristics of high-speed data generation, unlimited data volume, and concept drift. In the task of data stream classification, it is expensive and impractical to generate a large amount of training data by manual annotation. A data stream with a small number of samples labeled and a large number of samples unlabeled and with concept drifts presents a great challenge to machine learning. However, the existing research mainly focuses on supervised classification of data streams, while semi-supervised classification of data streams with concept drifts has not yet attracted attention enough. Therefore, based on the comprehensive collection of the work of semi-supervised classification of data streams, this study sorts the existing semi-supervised data stream classification algorithms into several types from several aspects, describes and summarizes many existing algorithms based on the types of classifiers used in the algorithms and the concept drift detection methods utilized. On some widely employed real and synthetic datasets, several representative semi-supervised classification algorithms for data streams are chosen to be compared and analyzed in many aspects. Finally, this study proposes some issues that are worthy to be further discussed in future for semi-supervised classification of data streams with concept drifts. The experimental results show that the classification accuracy of the algorithms for semi-supervised data stream classification is related to many factors, but it has the greatest relationship with the changes of data distribution. This review will help the interested researchers quickly enter into the field of semi-supervised classification of data streams.
LIU Yan-Fang , LI Wen-Bin , GAO Yang
2022, 33(4):1315-1325. DOI: 10.13328/j.cnki.jos.006480 CSTR:
Abstract:Compared with traditional online learning for fixed features, feature evolvable learning usually assumes that features would not vanish or appear in an arbitrary way, while the old features and new features gathered by the hardware devices will disappear and emerge at the same time along with the devices exchanging simultaneously. However, the existing feature evolvable algorithms merely utilize the first-order information of data streams, regardless of the second-order information which explores the correlations between features and significantly improves the classification performance. A confidence-weighted learning for feature evolution (CWFE) algorithm is proposed to solve the aforementioned problem. First, second-order confidence-weighted learning for data streams is introduced to update the prediction model. Next, in order to benefit the learned model, linear mapping during the overlap period is learned to recover the old features. Then, the existing model is updated with the recovered old features, and at the same time, a new predictive model is learned with the new features. Furthermore, two ensemble methods are introduced to utilize these two models. Finally, empirical studies show superior performance over state-of-the-art feature evolvable algorithms.
ZHAO Hui , WANG Hong-Jun , PENG Bo , LONG Zhi-Guo , LI Tian-Rui
2022, 33(4):1326-1337. DOI: 10.13328/j.cnki.jos.006465 CSTR:
Abstract:Feature learning is an important technique in machine learning, which studies data representation learning required by the post task from raw data. At present, most feature learning algorithms focus on learning topological structure of the original data, but ignore the discriminant information in the data. This study proposes a novel model called discriminant feature learning based on t-distribution stochastic neighbor embedding (DTSNE). In this model, the learning of discriminant information and the learning of topology structure are fused together, so both of them are learned to obtain the discriminant feature representation of the original data through iterative solution, which can significantly improve the performance of the machine learning algorithm. Experimental results on multiple open data sets demonstrate the effectiveness of the proposed model.
2022, 33(4):1338-1353. DOI: 10.13328/j.cnki.jos.006466 CSTR:
Abstract:To overcome the limitations of the FSIP (feature selection based on information gain and Pearson correlation coefficient) feature selection algorithm that need human to determine the borderline to detect the feature subsets, the totally adaptive 2D feature selection algorithm is proposed in this study based on discernibility matrix. It is referred to as DFSIP (discernibility based FSIP). DFSIP introduces discernibility matrix into the feature selection process of FSIP. It first initializes the candidate feature set comprising all features and constructs the initial discernibility matrix, then it detects the most significant feature from the current candidate feature set, so as to add it to feature subset and use it to reduce the discernibility matrix. After that the candidate feature set is updated using the union of the cells of the reduced discernibility matrix, and the most significant feature is detected from the current candidate feature set again, so as to put it into the feature subset and use it to reduce the discernibility matrix, and the candidate feature set is updated again. This process repeats till there is not any feature left in the candidate feature set. The power of DFSIP is tested on very famous gene expression datasets, and its performance is compared with that of the popular feature selection algorithms including FSIP, mRMR, LLE Score, DRJMIM, AVC, and AMID by comparing the performance of the K-ELM classifier built using the feature subset detected by these feature selection algorithms. In addition, the significant test is done to verify whether or not there is the significant difference between DFSIP and FSIP as well as other compared feature selection algorithms. The experimental results demonstrate that DFSIP is superior to the compared ones, especially it has the significant difference to LLE Score, DRJMIM, and AMID feature selection algorithms. Although there is not significant difference between DFSIP and FSIP, it defeats FSIP in performance. It can be concluded that DFSIP can totally adaptively detect the feature subset with sound classification capability.
LIU Xiao-Lin , BAI Liang , ZHAO Xing-Wang , LIANG Ji-Ye
2022, 33(4):1354-1372. DOI: 10.13328/j.cnki.jos.006471 CSTR:
Abstract:In real applications, it is an important field for clustering the multi-view data in data mining. The incompleteness of multi- view caused by missing samples brings great challenge to multi-view clustering task. The shallow graph structure information is easily affected by noise and missing data. Most of the existing multi-view clustering methods are difficult to describe the underlying structure of all views accurately and comprehensively, which reduces the performance of incomplete multi-view clustering. To this end, this study proposes a robust incomplete multi-view clustering algorithm based on the strategies of diffusing and fusing among multi-order neighborhoods. Firstly, the proposed algorithm obtains the potential structural information from incomplete views by utilizing multi-order similarities. Then, the deep structural information of multi-views is nonlinearly fused by the way of cross-view diffusion. Through all above, the much more comprehensive structural information among views can be extracted from the proposed algorithm, thereby reducing the uncertainty of views-structure caused by missing samples. In addition, this paper presents detailed steps to prove the convergence of the proposed algorithm. Experimental results show that the proposed method is more effective in solving the problem of incomplete multi-view clustering than other existing methods.
ZHANG Yi-Ling , YANG Yan , ZHOU Wei , OUYANG Xiao-Cao , HU Jie
2022, 33(4):1373-1389. DOI: 10.13328/j.cnki.jos.006474 CSTR:
Abstract:Spectral clustering, which is one of the most representative methods in clustering analysis, receives much attention from scholars, because it does not constrain the data structure of the original samples. However, traditional spectral clustering algorithm usually contains two major limitations, i.e., it is unable to cope with the large-scale settings and complex data distribution. To overcome the above shortcomings, this study introduces a deep learning framework to improve the generalization and scalability of spectral clustering, and combines the multi-view learning to mine diverse features among data samples, finally proposes a knowledge transferring based deep consensus network for multi-view spectral clustering (CMvSC). First, considering the local invariance of single view, CMvSC adopts the local learning layer to learn the specific embedding of each view individually. Then, because of the global consistency among multiple views, CMvSC introduces the global learning layer to achieve parameter sharing and feature transferring, and learns the shared embedding in different views. Meanwhile, taking the effect of affinity matrix for spectral clustering into consideration, CMvSC learns the affinity correlation between the paired samples by training the Siamese network and designing the contrastive loss, which replaces the distance metric in traditional spectral clustering. Finally, the experimental results on four datasets demonstrate the effectiveness of the proposed CMvSC for multi-view clustering.
SUN Lin , QIN Xiao-Ying , XU Jiu-Cheng , XUE Zhan-Ao
2022, 33(4):1390-1411. DOI: 10.13328/j.cnki.jos.006462 CSTR:
Abstract:The density peak clustering (DPC) algorithm is a simple and effective clustering analysis algorithm. However, in real-world practical applications, it is difficult for DPC to select the correct cluster centers for datasets with large differences of density among clusters or multi-density peaks in clusters. Furthermore, the allocation method of points in DPC has a domino effect. To address these issues, a density peak clustering algorithm based on the K-nearest neighbors (KNN) and the optimized allocation strategy was proposed. First, the candidate cluster centers using the KNN, densities of points, and boundary points were determined. The path distance was defined to reflect the similarity between the candidate cluster centers, based on which, the density factor and distance factor were proposed to quantify the possibility of candidate cluster centers as cluster centers, and then the cluster centers were determined. Second, to improve the allocation precision of points, according to the shared nearest neighbors, high density nearest neighbor, density difference, and distance between KNN, the similarity measures were constructed, and then some concepts of the neighborhood, similarity set, and similarity domain were proposed to assist in the allocation of points. The initial clustering results were determined according to the similarity domains and boundary points, and then the intermediate clustering results were achieved based on the cluster centers. Finally, according to the intermediate clustering results and similarity set, the clusters were divided into multiple layers from the cluster centers to the cluster boundaries, for which the allocation strategies of points were designed, respectively. To determine the allocation order of points in the specific layer, the positive value was presented based on the similarity domain and positive domain. The point was allocated to the dominant cluster in its positive domain. Thus, the final clustering results were obtained. The experimental results on 11 synthetic datasets and 27 real datasets demonstrate that the proposed algorithm has sound clustering performance in metrics of the purity, F-measure, accuracy, Rand index, adjusted Rand index, and normalized mutual information when compared with the state-of-the-art DPC algorithms.
LÜ Jian-Cheng , YE Qing , TIAN Yu-Xin , HAN Jun-Wei , WU Feng
2022, 33(4):1412-1429. DOI: 10.13328/j.cnki.jos.006470 CSTR:
Abstract:Large-scale deep neural networks (DNNs) exhibit powerful end-to-end representation and infinite approximation of nonlinear functions, showing excellent performance in several fields and becoming an important development direction. For example, the natural language processing model GPT, after years of development, now has 175 billion network parameters and achieves state-of-the-art performance on several NLP benchmarks. However, according to the existing deep neural network organization, the current large-scale network is difficult to reach the scale of human brain biological neural network connection. At the same time, the existing large-scale DNNs do not perform well in multi-channel collaborative processing, knowledge storage, and reasoning. This study proposes a brain-inspired large-scale DNN model, which is inspired by the division and the functional mechanism of brain regions and built modularly by the functional of the brain, integrates a large amount of existing data and pre-trained models, and proposes the corresponding learning algorithm by the functional mechanism of the brain. The DNN model implements a pathway to automatically build a DNN as an output using the scene as an input. Simultaneously, it should not only learn the correlation between input and output but also needs to have the multi-channel collaborative processing capability to improve the correlation quality, thereby realizing knowledge storage and reasoning ability, which could be treated as a way toward general artificial intelligence. The whole model and all data sets and brain-inspired functional areas are managed by a database system which is equipped with the distributed training algorithms to support the efficient training of the large-scale DNN on computing clusters. This study also proposes a novel idea to implement general artificial intelligence, and the large-scale model is validated on several different modal tasks.
WANG Chen-Yang , REN Yi , MA Wei-Zhi , ZHANG Min , LIU Yi-Qun , MA Shao-Ping
2022, 33(4):1430-1438. DOI: 10.13328/j.cnki.jos.006473 CSTR:
Abstract:In recent years, many recommendation algorithms have been proposed, and the research of recommender system has been greatly boosted with the development of deep learning. However, concerns about the reproducibility in this field have increasingly arisen in the research community, owing to the slight but influential differences between recommendation algorithms, such as implementation details, evaluation protocols, dataset splitting, etc. To address this issue, ReChorus is presented, of which it is a comprehensive, efficient, flexible, and lightweight framework for recommendation algorithms based on PyTorch, with aims to form a “Chorus” of recommendation algorithms. In this framework, a wide range of recommendation algorithms of different categories is implemented, covering general recommendation, sequential recommendation, knowledge-aware recommendation, and time-aware recommendation. ReChorus also provides the paradigm of dataset preprocessing for some common datasets. Compared to other recommendation algorithm libraries, ReChorus is featured for that it strives to keep lightweight while unifies as many as different algorithms at the same time. ReChorus is also flexible, efficient, and easy to use, especially for research purposes. Researchers will find it effortless to implement new algorithms with ReChorus. Such a framework can help to train and evaluate different recommendation models under the same experimental setting, so as to avoid the impacts resulting from implementation details and assure an effective comparison among recommendation algorithms. The project has been released on GitHub: https://github.com/THUwangcy/ReChorus.
YANG Jia-Xin , DU Jun-Ping , SHAO Ying-Xia , LI Ang , XI Jun-Qing
2022, 33(4):1439-1450. DOI: 10.13328/j.cnki.jos.006483 CSTR:
Abstract:In the era of big data, intellectual-property-oriented scientific and technological resources show trends such as large data scale, high timeliness, and low value density, which poses severe challenges for the effective use of intellectual property resources. At the same time, the demand for the mining of hidden information in intellectual property rights is increasing in various countries, making the construction of intellectual-property-oriented scientific and technological resource portraits a current research hotspot. This study aim at building a portrait of intellectual property through intelligent data acquisition, entity recognition and visualization. However, the existing methods for constructing scientific and technological resource portraits are only suitable for structured data and ignore the impact of words’ part of speech on the semantic understanding of sentences. Therefore, a novel algorithm is proposed for the construction of intellectual-property-oriented portraits of scientific and technological resources. Regarding the automatically acquired intellectual property resources, attention mechanism of part-of-speech level is introduced to improve the accuracy of entity recognition, and intellectual-property-oriented scientific and technological resource portraits are visually constructed. Compared with the existing methods, the proposed method has the following advantages: 1) This utilizes the part-of-speech information of words to learn the semantic meaning of sentences, and integrates the attention mechanism to avoid ambiguities in semantic understanding in a supervised way. 2) This model can intelligently and automatically complete sci-tech data acquisition, named entity recognition, and construction of scientific and technological resource portraits. 3) Extensive experiments demonstrate that our method performs better than baselines in named entity recognition by utilizing the part of speech of words for supervised learning.
QIAO Shao-Jie , HAN Nan , YUE Kun , YI Yu-Gen , HUANG Fa-Liang , YUAN Chang-An , DING Peng , Louis Alberto GUTIERREZ
2022, 33(4):1451-1476. DOI: 10.13328/j.cnki.jos.006461 CSTR:
Abstract:Bike-sharing system is becoming more and more popular and there accumulates a large volume of trajectory data. In the bike-sharing system, the borrowing and returning behavior of users are arbitrary. In addition, bike-sharing system will be affected by weather, time period, and other dynamic factors, which makes shared bike scheduling unbalanced, affects user’s experience, and causes huge economic losses to operators. A novel shared-bike demand prediction model based on station clustering is proposed, the activeness of stations is calculated by constructing a bike transformation network. The geographical location of stations and the bike transmission patterns are taken into full consideration, and the stations with near distances and transformation patterns are aggregated into a cluster based on the idea of data field clustering. In addition, a method for computing the optimal number of cluster centers is presented. The influence of time and weather factors on bike demand is fully analyzed and the Pearson correlation coefficient is used to choose the most relevant weather features from the real weather data and transformed into a three-dimensional vector by taking into consideration the historical demand for bicycles in the cluster. In addition, long short-term memory (LSTM) neural network with multiple features is employed to learn and train the feature information in the vector, and the bike demand in each cluster is predicted and analyzed every thirty minutes. When compared with the traditional machine learning algorithms and the state-of-the-art methods, the results show that the prediction performance of the proposed model has been significantly improved.
LIANG Xing-Xing , MA Yang , FENG Yang-He , ZHANG Yu-Long , ZHANG Long-Fei , LIAO Shi-Jiang , LIU Zhong
2022, 33(4):1477-1500. DOI: 10.13328/j.cnki.jos.006472 CSTR:
Abstract:With the development of intelligent warfare, the fragmentation and uncertainty of real-time information in highly competitive scenarios such as military operations and anti-terrorism assault put forward higher requirements for making flexible policy with game advantages. The research of intelligent policy learning method with self-learning ability has become the core issue of formation-level tasks. Faced with difficulties in state representation and low data utilization efficiency, a sample adaptive policy learning method is proposed based on predictive coding. The auto-encoder model is applied to compress the original task state space, and the predictive coding of the dynamic environment is obtained through the state transition samples of the environment combined with the autoregressive model using the mixed density distribution network, which improves the capacity of the task state representation. Temporal difference error is utilized by the predictive-coding-based sample adaptive method to predict the value function, which improves the data efficiency and accelerates the convergence of the algorithm. To verify its effectiveness, a typical air combat scenario is constructed based on the previous national wargame competition platforms, where five specially designed rule-based agents are included by the contestants. The ablation experiments are implemented to verify the influence of different factors with regard to coding strategies and sampling policies while the Elo scoring mechanism is adopted to rank the agents. Experimental results confirm that MDN-AF, the sample adaptive algorithm based on predictive coding,reaches the highest score with an average winning rate of 71%, 67.6% of which are easy wins. Moreover, it has learned four kinds of interpretable long-term strategies including autonomous wave division, supplementary reconnaissance, “snake” strike and bomber-in-the-rear formation. In addition, the agent applying this algorithm framework has won the national first prize of 2020 National Wargame Competition.
LI Yao-Qian , LI Cai-Zi , LIU Rui-Qiang , SI Wei-Xin , JIN Yue-Ming , HENG Pheng-Ann
2022, 33(4):1501-1515. DOI: 10.13328/j.cnki.jos.006469 CSTR:
Abstract:With the increasingly wide application of surgical robots in clinical practice, it is of great significance to provide doctors with precise semantic segmentation information of surgical instrument in endoscopic video to improve the clinicians’ operation accuracy and patients’ prognosis. Training surgical instrument segmentation models requires a large amount of accurately labeled video frames, which limits the application of deep learning in the surgical instrument segmentation task due to the high cost of video data labeling. The current semi-supervised methods enhance the temporal information and data diversity of sparsely labeled videos by predicting and interpolating frames, which can improve the segmentation accuracy with limited labeled data. However, these semi-supervised methods suffer from the drawbacks of frame interpolation quality and temporal feature extraction from sequential frames. To tackle this issue, this study proposes a semi-supervised segmentation framework with spatiotemporal Transformer, which can improve the temporal consistency and data diversity of sparsely labeled video datasets by interpolating frames with high accuracy and generating pseudo-labels. Here the Transformer module is integrated at the bottleneck position of the segmentation network to analyze global contextual information from both temporal and spatial perspectives, enhancing advanced semantic features while improving the perception to complex environments of the segmentation network, which can overcome various types of distractions in surgical videos and thus improve the segmentation effect. The proposed semi-supervised segmentation framework with Transformer achieves an average DICE of 82.42% and an average IOU of 72.01% on the MICCAI 2017 Surgical Instrument Segmentation Challenge dataset using only 30% labeled data, which exceeds the state-of-the-art method by 7.68% and 8.19%, respectively, and outperforms the fully supervised methods.
XU Xin-Zheng , CHANG Jian-Ying , DING Shi-Fei
2022, 33(4):1516-1526. DOI: 10.13328/j.cnki.jos.006482 CSTR:
Abstract:The image style transferring technology has been widely integrated into people’s life, and it is widely used in image artistry, cartoon, picture coloring, filter processing, and occlusion removal of the practical scenarios, so image style transfering has an important research significance and application value. StarGAN is a generative adversarial network framework for multi-domain image style transfering in recent years. StarGAN extracts features through simple down-sampling, and then generates images through up-sampling. Nevertheless, the background color information and detailed features of people’s faces in the generated images are quite different from those in the input images. In this study, by improving the network structure of StarGAN, after analyzing the existing problems of the StarGAN, a UE-StarGAN model for image style transfering is proposed by introducing U-Net and edge-promoting adversarial loss function. At the same time, the class encoder is introduced into the generator of UE-StarGAN, and a small sample image style transfering model is designed to realize the small sample image style transfer. The results of this experiment show that the model can extract more detailed features, have some advantages in the case of small sample size, and to a certain extent, the qualitative and quantitative analysis results of the images can be improved after the image style transfering, which verifies the effectiveness of the proposed model.