LIAO Xiang-Wen , LIU De-Yuan , GUI Lin , CHENG Xue-Qi , CHEN Guo-Long
2018, 29(10):2899-2914. DOI: 10.13328/j.cnki.jos.005548 CSTR:
Abstract:Opinion retrieval is a hot topic in the research of natural language processing. Most existing approaches in text opinion retrieval can not extract knowledge and concept from context. They also lack opinion generalization ability and overlook the semantic relations between words. This paper proposes an opinion retrieval method based on knowledge graph conceptualization and network embedding. First, conceptual knowledge graph is used to conceptualize the queries and texts into the correct conceptual space while the nodes in the knowledge graph are embedded into low dimensional vectors space by network embedding technology. Then, the similarity between queries and texts is calculated based on embedding vectors. According to the similarity score, the opinion scores of texts can be captured based on statistical machine learning methods. Finally, the concept space, knowledge representation space, and opinion mining result serve opinion retrieval models. The experiment shows that the retrieval model proposed in this paper can effectively improve the retrieval performance of multiple retrieval models. Compared with referenced method based on unified opinion, the proposed approach improves the MAP scores by 6.1% and 9.3%, respectively. Compared with referenced method based on learning to rank, proposed approach improves the MAP scores by 2.3% and 14.6%, respectively.
ZHENG Yu-Yan , TIAN Ying , SHI Chuan
2018, 29(10):2915-2930. DOI: 10.13328/j.cnki.jos.005549 CSTR:
Abstract:Entity set expansion (ESE) refers to getting a more complete set according to some rules, given several seed entities with specific semantic meaning. As a popular data mining task, ESE has many applications, such as dictionary construction and query suggestion. Contemporary ESE mainly utilizes text or Web information. That is, the intrinsic relations among entities are inferred from theirco-occurrences in text or Web. With the surge of knowledge graph in recent years, it is possible to extend entities according to their co-occurrences in knowledge graph. This paper studies the problem of the entity set expansion in knowledge graph. That is, given several seed entities, how to obtain more entities by leveraging knowledge graph. Firstly, the knowledge graph is modeled as a heterogeneous information network (HIN), which contains multiple types of entities or relationships. Next, a novel method of entity set expansion based on frequent pattern under Meta path, called FPMP_ESE, is proposed. FPMP_ESE employs Meta paths to capture the implicit common traits of seed entities. In order to find the important Meta paths between entities, an automatic Meta path generation method is designed based on frequent pattern called FPMPG. Then, two kinds of heuristic and PU learning methods are developed to distribute the weights of Meta paths. Finally, experiments on real dataset Yago demonstrate that the proposed method has better effectiveness and higher efficiency compared to other methods.
YANG Yu-Ji , XU Bin , HU Jia-Wei , TONG Mei-Han , ZHANG Peng , ZHENG Li
2018, 29(10):2931-2947. DOI: 10.13328/j.cnki.jos.005552 CSTR:
Abstract:In supporting semantic Web, knowledge graphs have played a vital role in many areas such as knowledge QA and semantic search. Therefore, they have become a hot topic in the field of research and engineering. However, it is often costly to build a large-scale knowledge graph with high accuracy. How to balance the accuracy and efficiency, and quickly build a high-quality domain knowledge graph, is a big challenge in the field of knowledge engineering. This paper engages a systematic study on the construction of domain knowledge graphs, and puts forward an accurate and efficient method of constructing domain knowledge graphs as "four-steps". This method has been applied to the construction of knowledge graphs of nine subjects in the k12 education of China, and the nine subject knowledge graphs have been developed with high accuracy, which demonstrates that the new method is effective. For example, the geographical knowledge graph, which is constructed using the "four-steps" method, has 670 thousand instances and 14.21 million triples. And as part of it, the annotation data's knowledge coverage and knowledge accuracy are both above 99%.
ZHANG Yu , OUYANG Dan-Tong , YE Yu-Xin
2018, 29(10):2948-2965. DOI: 10.13328/j.cnki.jos.005550 CSTR:
Abstract:This study focuses on the debugging and repairing techniques for incoherent ontology based on the black-box method and discusses the limitations of the existing black-box methods and their optimizations. To solve this problem, the study proposes a new strategy called clash path for debugging and repairing incoherent ontology. This strategy can construct the clash path related to the basis clash models and then identify the clash set based on the clash path. In this case, deubgging can be rapidly performed based on the clash set because the clash set is smaller than the original ontology. In addition, the unsatisfiable dependent path can be identified from the clash path and the repair set can be easily obtained on the basis of the unsatisfiable dependent path. The theoretic proofs and experimental evaluation demonstrate that the presented debugging and repairing strategies are correct and efficiency.
GUAN Sai-Ping , JIN Xiao-Long , JIA Yan-Tao , WANG Yuan-Zhuo , CHENG Xue-Qi
2018, 29(10):2966-2994. DOI: 10.13328/j.cnki.jos.005551 CSTR:
Abstract:In recent years, the rapid development of Internet technology and Web applications has triggered the explosion of various data on the Internet, which generates a large amount of valuable knowledge. How to organize, represent and analyze these knowledge has attracted much attention. Knowledge graph was thus developed to organize these knowledge in a semantical and visualized manner. Knowledge reasoning over knowledge graph then becomes one of the hot research topics and plays an important role in many applications such as vertical search and intelligent question-answer. The goal of knowledge reasoning over knowledge graph is to infer new facts or identify erroneous facts according to existing ones. Unlike traditional knowledge reasoning, knowledge reasoning over knowledge graph is more diversified, due to the simplicity, intuitiveness, flexibility, and richness of knowledge representation in knowledge graph. Starting with the basic concept of knowledge reasoning, this paper presents a survey on the recently developed methods for knowledge reasoning over knowledge graph. Specifically, the research progress is reviewed in detail from two aspects:One-Step reasoning and multi-step reasoning, each including rule based reasoning, distributed embedding based reasoning, neural network based reasoning and hybrid reasoning. Finally, future research directions and outlook of knowledge reasoning over knowledge graph are discussed.
CUI Xian-Ji , HE Jia-Liang , ZHANG Jun-Xing , GAO Jian
2018, 29(10):2995-3008. DOI: 10.13328/j.cnki.jos.005553 CSTR:
Abstract:Ontology debugging is one of the non-standard reasoning tasks in artificial intelligence, and is important for ontology engineering. In this work, the complementary concepts and search graph are combined to optimize the calculation of minimal unsatisfiability preserving sub-TBox (MUPS) for the unsatisfiable concepts. Firstly, the necessity of checking the satisfaction of the concept for the expanded terminology is determined by whether it contains the complementary concepts to reduce the number of calling reasoners to some extent. Then, a search graph is constructed according to the terminology expanding process to quick search the node which is corresponding to the unsatisfiable sub-terminologies by breadth-first-search and depth-first-search strategies. This optimization reduces the number of axioms in terminologies to be checked, and it also improves the searching efficiency of the nodes corresponding to the unsatisfiable sub-terminologies. Finally, the optimized algorithms are realized and compared with existing black box algorithm. The experimental results show that the proposed method is superior to existing MUPS calculation in the calling number of reasoners and the number of axioms in the terminologies, which may effectively improve the efficiency of MUPS calculation.
LI Chun-Miao , CAI Xiao-Juan , LI Guo-Qiang
2018, 29(10):3009-3020. DOI: 10.13328/j.cnki.jos.005321 CSTR:
Abstract:Well-Structured pushdown systems (WSPDSs) combine pushdown systems and well-structured transition systems to allow locations and stack alphabets to be vectors, and thus they are unbounded. States change with the push and pop operations on the stack. The model has been said to be "very close to the border of undecidability". This paper proposes a general framework to get the lower bounds for coverability complexity of a model by adopting the reset-zero method. The paper proves that the complexity is Tower-hard when a WSPDS is restricted with three dimensional state vectors, and Hyper-Ackermann hard for the general WSPDSs.
2018, 29(10):3021-3050. DOI: 10.13328/j.cnki.jos.005613 CSTR:
Abstract:Currently, many software applications are rapidly developed and deployed. Rapid iterations are used to implement new requirements while keeping high quality. Continuous integration (CI) is an important activity to make it work. CI tests the code automatically before it will be integrated into the master branch to ensure its quality. The big challenge is how to select an appropriate test case set for continuous integration. Running all test cases will consume a large amount of resource and get feedback in a long cycle, but selecting an improper test case subset will lose some required test, and therefore increase quality risk. The goal of test set optimization is to select an appropriate test case subset which can satisfy the test requirement for each unit or iteration test while reducing the test resource as much as possible. In this paper, a systematic literature review is performed to reveal the current status of CI-test set optimization based on existing literature. More specifically, the related studies published between 2007 and 2017 from four databases of electronic literature are searched to finally select 39 important studies for careful review. Five research questions are presented in order to achieve the above goal, and data from the selected studies are extracted to analyze and answer the questions. The analysis results contribute some useful findings and new challenges for researches and practitioners.
ZHOU Xiao-Cong , LAI Wei , WEN Jian-Feng
2018, 29(10):3051-3067. DOI: 10.13328/j.cnki.jos.005290 CSTR:
Abstract:The distribution information of measurement data is important for understanding and using object-oriented software metrics. However, there is no research on the distribution of objectoriented software cohesion metrics except the CK metric LCOM. Unfortunately, existing studies show that LCOM is not a good metric for class cohesion of object-oriented software. Therefore, it is necessary to investigate the distribution of other cohesion metrics. Based on 112 Java open-source software systems, the distribution of measurement data of 17 cohesion metrics, including those lack of cohesion metrics (e.g. LCOM1, LCOM2, LCOM3, LCOM4, LCOM5), connectivity-based cohesionmetrics (e.g. TCC, LCC, DCI, DCD), and similarity-based cohesion metrics (e.g. CC, LSCC, SCOM, SCC), are investigated empirically in this paper. The results show that the data of non-normalized cohesion metric can be fitted into log-normal distribution and power law distribution, whereas the data of similarity-based cohesion metrics should be fitted into log-normal distribution only when excluding those special class with at most one method and without field data, and, for connectivitybased cohesion metrics, the data of their corresponding non-normalized cohesion metric follow log-normal distribution and power law distribution. This study provides important information for researchers to understand and use cohesion metrics, in particular, to identify thresholds for these metrics using the approaches in the literatures.
2018, 29(10):3068-3090. DOI: 10.13328/j.cnki.jos.005607 CSTR:
Abstract:Designing problems are ubiquitous in science research and industry applications. In recent years, Bayesian optimization, which acts as a very effective global optimization algorithm, has been widely applied in designing problems. By structuring the probabilistic surrogate model and the acquisition function appropriately, Bayesian optimization framework can guarantee to obtain the optimal solution under a few numbers of function evaluations, thus it is very suitable to solve the extremely complex optimization problems in which their objective functions could not be expressed, or the functions are non-convex, multimodal and computational expensive. This paper provides a detailed analysis on Bayesian optimization in methodology and application areas, and discusses its research status and the problems in future researches. This work is hopefully beneficial to the researchers from the related communities.
2018, 29(10):3091-3110. DOI: 10.13328/j.cnki.jos.005385 CSTR:
Abstract:First in this paper, an equivalent characterization of generalized tautologies in Łn is introduced by using the method of cut of fuzzy set, the Hamming distance, Hamming similarity degree and Hamming truth degree between formulas are defined by means of standard Hamming distance between fuzzy sets, and Hamming distance representation of the basic concepts of quantitative logic is described. Then, the truth degree decomposition theorem of formula in quantitative logic is presented. This theorem points out that in quantitative logic, the truth degree of any formula φ is equal to the sum of the truth degrees of some incompatible formulas, and the formula φ itself is logically equivalent to the join of these formulas. Finally, the problem of triple-I truth degree solution of the generalized MP problem is proposed, and the existence of the triple-I truth degree solution is discussed.
MENG Xiang-Wu , LI Rui-Chang , ZHANG Yu-Jie , JI Wei-Yu
2018, 29(10):3111-3133. DOI: 10.13328/j.cnki.jos.005608 CSTR:
Abstract:In recent years, with the popularity of mobile smart devices, location based social networks are on the rise. Users trend to share their wonderful experiences with their friends, resulting in producing large-scale user trajectory with temporal and spatial attributes. From a narrow perspective, the trajectory data refers to continuously sampled GPS data only. From a broad perspective, it can be called trajectory data as long as the data has sequential characteristic. Thus, the check-ins, acquired from a social network, can also be considered coarse-grained trajectory data. The generalized trajectory data has the characteristics of spatiotemporal heterogeneity, continuous and discrete coexistence, and containing temporal-spatial items with unclear hierarchy and classification. However, compared to the GPS trajectory data, the generalized trajectory data source is extensive and contains rich information, which brings great opportunity to the traditional mobile recommender system. At the same time, the generalized trajectory data has big scale and diversity structure, which also presents great challenges to the system. It has become an important issue how to use the generalized trajectory data to improve the performance of mobile recommender system in academia and industry. This paper takes the trajectory data characteristics as the focal point to analyze and survey main recommender methods and evaluation metrics based on generalized user trajectory data. Further, it expounds the relationships and differences between traditional mobile recommender systems and the mobile recommender systems based on user trajectory data. Finally, the paper discusses the difficulty and development trend of mobile recommender systems based on generalized user trajectory.
WANG Shao-Qing , WANG Zheng , LI Cui-Ping , ZHAO Kan-Kan , CHEN Hong
2018, 29(10):3134-3149. DOI: 10.13328/j.cnki.jos.005284 CSTR:
Abstract:Event-Based social networks (EBSNs) have experienced rapid growth in people's daily life. Hence, event recommendation plays an important role in helping people discover interesting online events and attend offline activities face to face in the real world. However, event recommendation is quite different from traditional recommender systems, and there are several challenges:(1) One user can only attend a scarce number of events, leading to a very sparse user-event matrix; (2) The response data of users is implicit feedback; (3) Events have their life cycles, so outdated events should not be recommended to users; (4) A large number of new events which are created every day need to be recommended to users in time. To cope with these challenges, this article proposes to jointly model heterogeneous social and content information for event recommendation. This approach explores both the online and offline social interactions and fuses the content of events to model their joint effect on users' decision-making for events. Extensive experiments are conducted to evaluate the performance of the proposed model on Meetup dataset. The experimental results demonstrate that the proposed model outperforms state-of-the-art methods.
2018, 29(10):3150-3163. DOI: 10.13328/j.cnki.jos.005286 CSTR:
Abstract:Many studies have been conducted on similarity join over certain (deterministic) graphs. However, in reality, graphs are often uncertain due to various factors. This paper studies similarity join on uncertain graph databases. The study employs the joint probability distribution to describe the uncertainty of edges in the graph, combines a new measure to evaluate graph similarity, and gives the formal definition of the similarity join on uncertain graph database. The paper also designs a group of filtering strategies to reduce the candidate pairs in the similarity join. A large number of experimental data show that, the method proposed in the paper is feasible and accurate.
HU Chuan , MENG Xiang-Wu , ZHANG Yu-Jie , DU Yu-Lu
2018, 29(10):3164-3183. DOI: 10.13328/j.cnki.jos.005288 CSTR:
Abstract:Group recommender systems have recently become one of the most prevalent topics in recommender systems. As an effective solution to the problem of group recommendation, Group recommender systems have been utilized in news, music, movies, food, and so forth through extending individual recommendation to group recommendation. The existing group recommender systems usually employ aggregating preference strategy or aggregating recommendation strategy, but the effectiveness of both two methods is not well solved yet, and they respectively have their own advantages and disadvantages. Aggregating preference strategy possesses a fairness problem between group members, whereas aggregating recommendation strategy pays less attention to the interaction between group members. This paper proposes an enhanced group recommendation method based on preference aggregation, incorporating simultaneously the advantages of the aforesaid two aggregation methods. Further, the paper demonstrates that group preference and personal preference are similar, which is also considered in the proposed method. Experimental results show that the proposed method outperforms baselines in terms of effectiveness based on Movielens dataset.
WU Xia , TANG Zu-Kai , ZHU Yuan-Yuan , PENG Yu-Wei , PENG Zhi-Yong
2018, 29(10):3184-3204. DOI: 10.13328/j.cnki.jos.005418 CSTR:
Abstract:Along with the development of the GPS positioning technology and smart mobile devices, more and more trajectory data are collected continuously every day. Thus, managing and mining useful information from these trajectories is critical in many application areas. Compared with raw trajectory data, semantic trajectory data equipped with semantic information has better quality, less volume and higher description ability, and thus it can be used in many applications such as trip recommendation, next location prediction, life pattern understanding, and friend recommendation. Mining frequent pattern in semantic trajectories is the fundamental problem in above tasks. In many circumstances, users may have the requirements on the arrival-time, e.g., users may want to visit a popular view spot at a certain timestamp and then arrive the railway station on time. Most of existing approaches on semantic trajectory pattern mining do not consider the arrival-time, and only a few existing approaches take the accurate arrival-time as the constraint, but they can barely find frequent patterns under such a strict time constraint. This paper, for the first time, studies the approximate arrival-time constrained frequent pattern (AAFP) mining problem. First, a baseline algorithm of mining AAFP is given by dividing the time axis into intervals. Then, an improved flexible algorithm is proposed to significantly improve the efficiency based on the AAP-tree index. Finally, a strategy to maintain the AAP-tree and the set of time axis partitions is introduced based on incremental information entropy. The experimental results on real trajectory datasets validate the effectiveness and efficiency of the proposed algorithms.
YAO Zhong-Jiang , GE Jing-Guo , ZHANG Xiao-Dan , ZHENG Hong-Bo , ZOU Zhuang , SUN Kun-Kun , XU Zi-Hao
2018, 29(10):3205-3222. DOI: 10.13328/j.cnki.jos.005620 CSTR:
Abstract:Traffic obfuscation technology is one of the most commonly used techniques in censorship-circumvention systems. In order to improve the recognition accuracy and supervisory ability of network traffic, much attention has been paid to the recognition and tracking of obfuscated traffic. Through in-depth analysis of three main traffic confusion technologies, such as randomization, mimicry and tunneling, this paper compares the technical framework, concealment, ease of use and application scenarios of the traffic confusion technologies. In addition, the paper reviews two types of recognition technology:deep packet inspection and machine learning, and compares their recognition accuracy. Furthermore, it analyzes and compares two types of traffic tracing technology:passive and proactive correlation. Finally, it discusses the identification and trace technology development trends of obfuscation traffic.
LUO En-Tao , WANG Guo-Jun , LIU Qin , MENG Da-Cheng
2018, 29(10):3223-3238. DOI: 10.13328/j.cnki.jos.005295 CSTR:
Abstract:In mobile social networks, users can look for friends by matching their attributes. In order to solve the problem that the user's attribute is easy to be stolen by the attackers in the single authority center and performance bottleneck occurs in the peak of service, this work proposes a scheme where a multi-attribute management center hierarchically manages user attributes' sub-keys. The scheme involves several attribute centers which perform fine-grained management on different user attributes. After the friend requester's attributes meet the friend access control policy of the friend-making initiator, the friend requester can correctly combine the sub-keys into a complete decryption key and decrypt the user's data file to store in the friend-making server. By introducing hierarchical management in terms of attribute sub-keys, the proposed scheme not only effectively prevents key disclosure when the single-authority management center suffers from attacks, but also improves the computation efficiency of friend profile matching through cooperative work of multiple attribute center. Experiments are conducted to check whether the proposed scheme can challenge the chosen plaintext attack, and certify that the scheme can achieve CPA secure level while effectively protecting the user's privacy security. Extensive comparisons with existing schemes demonstrate the ability of the proposed scheme to entail the lowest computational overheads and provide excellent user experience.