WANG Li-Jie , LI Meng , CAI Si-Bo , LI Ge , XIE Bing , YANG Fu-Qing
2012, 23(6):1335-1349. DOI: 10.3724/SP.J.1001.2012.04088
Abstract:With the development of Web services technologies, more and more public Web services have been published on the Internet. During the searching and utilizing of these public services, services' textual descriptions (such as introduction and user manual), which are generally expressed in natural language, provide great help for service consumers to locate, understand, and utilize proper Web services. Existing methods for services discovery usually try to obtain such descriptions only from services' WSDL files. However, according to this investigation, lots of Web services do not contain enough textual descriptions in their WSDL files. This paper proposes an approach to enriching textual descriptions for public Web services on the Internet using the information sources outside of WSDL files. Given a Web service, the study collects related Web pages containing its features from the Internet. Then, the enriched descriptions for the service are identified from the Web pages using information retrieval technologies. Experiments conducted on real data indicate that our approach can enrich descriptions for about half of the public services on the Internet effectively. The collected data is publicly available on the Internet.
WANG Shang-Guang , SUN Qi-Bo , YANG Fang-Chun
2012, 23(6):1350-1367. DOI: 10.3724/SP.J.1001.2012.04051
Abstract:In an open Web service environment, it is difficult to guarantee the authenticity of the feedback ratings reported by users after they have already consumed some services, which leads to large deviation between evaluation results and actual values of service reputation. Therefore, there are much more failures in service selection. To address the above problem, this paper proposes a reputation evaluation approach for Web service selection. The main idea of the proposed approach is evaluating the reputation by three reputation evaluation model including feedback checking, feedback adjustment, and feedback detection. Simulation results demonstrate the proposed approach not only improves the objectivity of reputation evaluation effectively, but also reduces the deviation of service selection significantly.
SUN Xiao-Bing , Li Bi-Xin , TAO Chuan-Qi
2012, 23(6):1368-1381. DOI: 10.3724/SP.J.1001.2012.04072
Abstract:Software progression is a fundamental ingredient of software. When changes are made to software, they will inevitably have some unpredicted effects and may cause inconsistencies with other parts of the original software. If the effects induced by the changes affect the whole system, an alternative change proposal may be required instead. Hence, change analysis is necessary before change implementation. This paper presented a compact intermediate representation for object oriented programs based on formal concept analysis—lattice of class and method dependence (LoCMD). Then, based on LoCMD, the study proposes a change analysis model, which includes some activities before change implementation, i.e., program comprehension, impact analysis and change assessment. The empirical study demonstrates the effectiveness of the representation and the change analysis model, and will help maintainers gain a better understanding about the change proposal.
WANG Gui-Bin , YANG Xue-Jun , TANG Tao , XU Xin-Hai
2012, 23(6):1382-1396. DOI: 10.3724/SP.J.1001.2012.04078
Abstract:As the processor's power consumption continually increases, power has become the most critical problems during the design and implementation of high performance computer (HPC) system. Nowadays, the heterogeneous system has been one important trend for HPC system. Compared with the traditional homogeneous system, the heterogeneous system has a higher theoretical performance and energy efficiency. However, explointing the potential advantage under the performance constraint is still a challenging problem. This work first establishs the energy optimization model for heterogeneous parallel system via abstracting common applications into the general program model which consists of sequential section and parallel section. Through theoretical analysis, the study conclude the relationship among heterogeneous processors for the minimum energy consumption during the single parallel section and full application (including multiple sections) and provide the corresponding algorithms to decide the operating frequencies under the given performance constraint. Finally, the study evaluates of the proposed model with eight typical applications on a CPU-GPU heterogeneous system.
WANG Shang-Guang , SUN Qi-Bo , ZHANG Guang-Wei , YANG Fang-Chun
2012, 23(6):1397-1412. DOI: 10.3724/SP.J.1001.2012.04084
Abstract:Because traditional QoS-aware Web service selection approach cannot ensure the reliability and the real-time of service selection, this paper proposes an uncertain QoS-aware Skyline service selection approach based on cloud model. The approach first uses cloud model to compute the uncertainty of QoS and then adopts Skyline computing to extract Skyline services from Web services to prune redundant services. Finally, mixed integer programming is employed to perform service selection from Skyline services. The study evaluates the approach experimentally using both real and synthetically generated datasets. The experimental results show that the proposed approach can accomplish service selection for users reliably and quickly.
WANG Lu-Lu , LI Bi-Xin , ZHOU Xiao-Yu
2012, 23(6):1413-1428. DOI: 10.3724/SP.J.1001.2012.04102
Abstract:Path profiling, an important technique in dynamic program analysis, which collects the execution times of different paths, has been widely used in a variety of areas. However, existing intra-procedural profiling techniques have inadequate abilities in loops, i.e., they can only either solve profiling acyclic paths, or limit loop iterations to a certain number first, and then profile paths under such a limitation. This paper presents a new profiling technique called PAP (profiling all paths), which can profile finite-length paths inside a procedural. PAP consists of two basic phases: one is the probe instrumentation which assigns a unique pathid for each path, and the other is backwalk which uses the pathids to determine the corresponding executed paths. Furthermore, breakpoints are introduced to store the probe value which may overflow during long executions, and the probe amount is reduced base on the integration of PAP with an existing profiling technique. Besides, this paper also discusses how to use PAP to profile executed sequences on the method level. As shown in the results of the case study and experiments, PAP is effective and efficient in profiling cyclic paths.
XIA Wei , YAO Yi-Ping , MU Xiao-Dong , LIU Lin
2012, 23(6):1429-1443. DOI: 10.3724/SP.J.1001.2012.04047
Abstract:Informal verification methods for simulation models are vulnerable to subjective ingredients. The traditional model checking method has difficulty dealing with large-scale simulation models because of the state space explosion. The parallel model checking (PMC) method has been accepted and successfully implemented in industrial tools because of the completeness and high efficiency. Unfortunately, it is hard to use as it involves several difficulties, such as formal specifications, logics, and parallel computing. To solve the above problems, a parallel model checking method for simulation models based on event graphs is proposed in this paper. This method extends event graphs in synchronization and defines the syntax and semantics of the extended event graphs. It transforms the extended event graphs to distributed and parallel verification environment (DVE) model, and PMC method is successfully applied to simulation model verification filed. With this method, simulation participators can verify simulation models with PMC method without learning new formal modeling languages. The efficiency and completeness of simulation model verification are improved. The experimental results show the validity of this method, and the method can improve the application of PMC method in simulation field.
CHEN Jun-Liang , WANG Chang-Chun , CHEN Chao
2012, 23(6):1444-1457. DOI: 10.3724/SP.J.1001.2012.04067
Abstract:This paper proposes an extended bipolar argumentation model, named extended bipolar argumentation framework (EBAF). In this model, attack and support relations are considered as two independent semantic relations and there exist recursive interactions among attacks and supports. In other words, there are attacks or supports for the attack and support relations without limits. This paper focuses on the determination of acceptable set of EBAF. First the attack and support relations are separated, and the argumentation framework with only attack and support relations are obtained as results. Second the attacks and supports are considered as entities that convert the recursive attacks and supports into attacks and supports under relation perspective. On this basis, basic semantic concepts and acceptability of EBAF are defined and the determination algorithm of acceptable set of EBAF is provided. Finally, this paper provides a comparison with other relative argumentation models.
2012, 23(6):1458-1471. DOI: 10.3724/SP.J.1001.2012.04071
Abstract:In this paper, inspired by the support vector machines for classification and the small sphere and large margin method, the study presents a novel large margin minimal reduced enclosing ball learning machine (LMMREB) for pattern classification to improve the classification performance of gap-tolerant classifiers by constructing a minimal enclosing hypersphere separating data with the maximum margin and minimum enclosing volume in the Mercer induced feature space. The basic idea is to find two optimal minimal reduced enclosing balls by adjusting a reduced factor parameter q such that each of binary classes is enclosed by them respectively and the margin between one class pattern and the reduced enclosing ball is maximized. Thus the idea implements implementing both maximum between-class margin and minimum within-class volume. Experimental results obtained with synthetic and real data show that the proposed algorithms are effective and competitive to other related diagrams.
LIU Zhan-Yi , LI Sheng , LIU Ting , WANG Hai-Feng
2012, 23(6):1472-1485. DOI: 10.3724/SP.J.1001.2012.04069
Abstract:Example-Based machine translation (EBMT) uses a preprocessed bilingual corpus as a main translation knowledge. The final translation is generated by editing examples that match the input sentence. In the EBMT system, the performances of example selection and translation selection heavily influence the quality of the final translation. This paper proposes a method to improve the performance of the EBMT method by using statistical collocation model, which is estimated from monolingual corpora, in three aspects. First, the statistical collocation model is used to estimate the matching degree between the input sentence and examples to improve the performance of the example selection. Second, the performance of translation selection is improved by evaluating the collocation strength of the translation candidates and the context. Third, the collocated words of the translation candidates in the example are detected by the statistical collocation model and then the collocated words are corrected according to the context. In order to evaluate the proposed method, this study conducts a series of experiments. First, the study evaluates the proposed methods in a word-based EBMT system. As compared with the baseline, the methods achieves absolute improvements of 4.73~6.48 BLEU score on English-to-Chinese translation. Then, the study also applies the proposed translation selection method to a semi-structured EBMT system, and the translation qualities are further improved, with an improvement of 1.82 BLEU score. The results of human evaluation show that the translations generated by the improved semi-structured EBMT system can express the majority of the meaning of source sentences, and the fluency of theses translations can also be accepted.
ZHAO Wei-Zhong , MA Hui-Fang , LI Zhi-Qing , SHI Zhong-Zhi
2012, 23(6):1486-1499. DOI: 10.3724/SP.J.1001.2012.04073
Abstract:Semi-Supervised document clustering and employing limited prior knowledge to aid in unsupervised clustering, have recently become a topic of significant interest to data mining and machine learning communities. Because receiving supervised data may be expensive, it is important to attain the most informative knowledge to improve the clustering performance. This paper presents a semi-supervised document clustering algorithm with active learning for pairwise constraints, aiming at getting improved clustering performance. The semi-supervised document clustering algorithm is a constrained DBSCAN (cons-DBSCAN) algorithm, which incorporates pairwise constraints to guide the clustering process in DBSCAN. Basing on measure of constraint set utility and analysis of DBSCAN algorithm, an active learning approach is proposed to select informative document pairs for obtaining user feedbacks. Experimental results show that this proposed approach is effective in document clustering. The clustering performance of active Cons-DBSCAN has dramatically improved with selected pairwise constraints. Moreover, the proposed approach performs better than the two representative methods.
2012, 23(6):1500-1516. DOI: 10.3724/SP.J.1001.2012.04074
Abstract:In the case of the imbalanced protocol flows, the changes of flow distribution have a huge impact on the accuracy and stability of traffic classifiers that use machine learning algorithms. It is very important to select a suitable machine learning algorithm to classify the imbalanced protocol flows on line. By means of single-factor experiment design, this paper verifies that it is possible for C4.5 decision tree, Na?ve Bayes with kernel density estimation (NBK) and support vector machine (SVM) to classify traffic with the first four packets of the TCP connection. After comparing the performances of the three classifiers abovementioned, the study finds that the testing time of C4.5 decision tree is the shortest and SVM is the most stable. Finally, Bagging algorithm is applied to classify traffic. The experimental results show that, the stability of Bagging is similar to SVM and the testing time and modeling time of Bagging is close to C4.5 decision tree. Therefore, Bagging classifier is the most suitable to classify traffic on line.
LIN Wang-Qun , LU Feng-Shun , DING Zhao-Yun , WU Quan-Yuan , ZHOU Bin , JIA Yan
2012, 23(6):1517-1530. DOI: 10.3724/SP.J.1001.2012.04076
Abstract:This paper proposes a weighted-graph based hierarchical community detection approach, which defines the community structure with the use of graph partition. With the pre-defined structure, a novel parallel social network community discovery algorithm (P-SNCD for short) is designed. P-SNCD algorithm avoids the disadvantage of traditional modularity based methods, which tend to discover communities of similar scales. Moreover, it can efficiently mine communities in parallel with the CPU scale of O(hmn) or O(hn2) and time complexity of O(logn), where h represents the density of the communities, m represents the total number of links and n represents the total number of nodes. Compared to the most of the existing methods, P-SNCD algorithm requires a few input parameters makes it even more practical. The accuracy and effectiveness of our algorithm is guaranteed by sufficient empirical studies in the later sections.
LIU Jing-Lei , LIAO Shi-Zhong , ZHANG Wei
2012, 23(6):1531-1541. DOI: 10.3724/SP.J.1001.2012.04090
Abstract:CP-nets (conditional preference networks) is a simple and intuitive graphical tool for representing conditional preference statements over the values of a set of variables. It has been a studying hotspot in artificial intelligence recently. The algorithm of strong dominance with respect to any binary-valued CP-nets has not been given; preferences completeness of CP-nets have not been studied by anyone, This paper makes a study of completeness and consistency of CP-nets by designing a strong dominance algorithm. First, by constructing induced graph of CP-nets and studying its properties, the study solves the problem of strong dominance with respect to any binary-valued CP-nets by Warshall algorithm to get the transitive closure of flip relation. Second, by solving the preference number of three kinds of CP-nets (separeble-structured, chain-structured, tree-structured), the study gives preferences incompleteness theorem and counting number formula of separeble condition preference networks. Finally, the study deals with consistency problem, and consistency judgment theorem and algorithm are given. The method not only solves some difficult problems proposed by Boutilier and Goldsmith, but also deepens the basic theory researching of CP-nets.
LI Wen-Feng , PENG Zhi-Yong , LI De-Yi
2012, 23(6):1542-1560. DOI: 10.3724/SP.J.1001.2012.04200
Abstract:Efficient processing of Top-K queries has always been a significant technique in the interactive environment involving massive amounts of data. With the emerging of imprecise data, the management of them has gradually raised people's attention. In contrast with traditional Top-K query, Top-K query on uncertain data presents different features both in semantics and computation. On the basis of prevailing uncertain data model and possible world semantic model, researchers have already studied multiple sound semantics and efficient approaches. This survey describes and classifies Top-K processing techniques on uncertain data including semantics, rank criteria, algorithms and implementation levels, and so on. Finally, the challenges and future research trends in processing of Top-K queries on uncertain data are predicated.
LI Ling-Li , WANG Hong-Zhi , GAO Hong , LI Jian-Zhong
2012, 23(6):1561-1577. DOI: 10.3724/SP.J.1001.2012.04114
Abstract:Keywords are suitable for query XML streams without schema information. In current forms of keywords search on XML streams and rank functions do not always represent users' intensions. This paper addresses this problem in another aspect. In this paper, the skyline Top-K keyword queries, a novel kind of keyword queries on XML streams, are presented. For such queries, skyline is used to choose results on XML streams without considering the complicated factors influencing the relevance to queries. With skyline query processing techniques, two techniques, are presented to process skyline Top-K keyword single queries and multi-queries on XML streams efficiently. Extensive experiments are performed to verify the effectiveness and efficiency of these techniques presented in this paper. According to the experimental results, the algorithms are not sensitive to the parameters such as the number of keywords, the number of results, the number of queries, and the runtime is approximately linear to the size of document.
ZHANG Xiao-Ming , LI Zhou-Jun , CHAO Wen-Han
2012, 23(6):1578-1587. DOI: 10.3724/SP.J.1001.2012.04111
Abstract:With the exponential growth of information on the Internet, it has become increasingly difficult to find and organize relevant material. Topic detection and tracking (TDT) is a research area addressing this problem. As one of the basic tasks of TDT, topic detection is the problem of grouping all stories, based on the topics they discuss. This paper proposes a new topic detection method (TPIC) based on an incremental clustering algorithm. The proposed topic detection strives to achieve a high accuracy and the capability of estimating the true number of topics in the document corpus. Term reweighing algorithm is used to accurately and efficiently cluster the given document corpus, and a self-refinement process of discriminative feature identification is proposed to improve the performance of clustering. Furthermore, topics' “aging” nature is used to precluster stories, and Bayesian information criterion (BIC) is used to estimate the true number of topics. Experimental results on linguistic data consortium (LDC) datasets TDT-4 show that the proposed model can improve both efficiency and accuracy, compared to other models.
CHEN Ke , HONG Yin-Jie , CHEN Gang
2012, 23(6):1588-1601. DOI: 10.3724/SP.J.1001.2012.04110
Abstract:Setting similarity search on possible worlds is semantically and computationally different from the traditional technique for sets of certain data. Considering the uncertainty of items of set, i.e. there is a certain probability for an item appearing in a set, the traditional technique used for processing sets is not applicable. This paper brings forward the formulas to measure the expected similarity of the sets based on possible worlds' semantics. In the expected contexts, if the expected similarity of a pair of sets (X,Y) is larger than a given threshold value τ ∈(0,1), this pair could be called as similar set pair. In the normal algorithm, the complexity of the expected similarity of uncertain sets based on possible worlds is of exponential order. This paper has provided new algorithms to calculate expected similarity by dynamic programming. The complexity of these algorithms is of polynomial order and they reduce execution time greatly. The final experiments have indicated the usability and the high performance of the new algorithms.
YANG Zhi , YIN Li-Hua , DUAN Mi-Yi , WU Jin-Yu , JIN Shu-Yuan , GUO Li
2012, 23(6):1602-1619. DOI: 10.3724/SP.J.1001.2012.04083
Abstract:Dynamically adjusting the security label of each subject is a main approach to improving the availability of MAC models. It includes the method of security label range and the method of taint propagation. The former lacks the support for a less proviledge subject, and the latter has a known covert channels. In this paper, a model called generalized taint propagation model (GTPM) is proposed to protect the confidentiality and integrity of operating systems. It inherits the least privilege characteristic of taint propagation model (TPM), expands the semantics of TPM to close the known covert channels, and introduces declassification and decontamination capacities of subjects to avoid accumulating contamination. The paper also introduces its specification using communicating sequential processes (CSP) language to clear the formal semantics of a GTPM operating system's behaviors of information flow control; Moreover, the study noninterference with declassification in CSP verification model of process equivalence, and proves that abstract GTPM system have the security property of noninterference with declassification in virtue of FDR tool. Finally, this paper uses an example to demonstrate its improvement of availability.
YU Jia-Geng , ZHOU Peng , WU Yan-Jun , ZHAO Chen
2012, 23(6):1620-1634. DOI: 10.3724/SP.J.1001.2012.04118
Abstract:To make the replay deterministic, the study presents the definition of VM replay by constructing a VM execution model, and then proves the sufficient conditions of VM replay using formal expressions of the algebra. Based on these conditions, the paper presents CASMotion, a Xen based implementation of VM execution replay. CASMotion classifies the category of non-deterministic events in Xen domU and presents their replaying methods and time matching algorithms. The experiment results show CASMotion can accurately replay the non-deterministic events with low performance penalty.