YUAN Ping-Hai , ZENG Qing-Kai , ZHANG Yun-Jian , LIU Yao
2020, 31(2):247-265. DOI: 10.13328/j.cnki.jos.005626
Abstract:Modern Web browsers introduce just-in-time (JIT) compilation mechanism to improve their performance on executing JavaScript applications. However, this mechanism has already been abused by attackers to inject malicious code. For instance, as JIT compilers may place JavaScript integers into code-cache in the form of operands of machine instructions, attackers can inject return-oriented programming (ROP) gadgets by crafting JavaScript integers. Fortunately, integer-based injection attacks have already been mitigated by techniques such as constant blinding. This work demonstrates that attackers can also inject ROP gadgets by using JavaScript code blocks instead of integer values. The idea of this injection scheme is based on the observation that the dynamic code generated by JIT compilers for a given JavaScript code snippet always has some immutable machine instruction sequences. The existence of these sequences is not affected by security mechanisms including constant blinding and address randomization. Moreover, these instruction sequences may contain ROP gadgets needed by attackers. Therefore, attackers can use JavaScript code blocks to obtain these gadgets in their attacks. The proposed injection scheme on SpiderMonkey and GoogleV8 is evaluated by running on x86-64 architecture. These two JIT engines are fed with JavaScript applications from well-known benchmarks and got a great many of dynamic code blocks. Statistical results show that Turing-complete sets of gadgets can be got in these code blocks. In real word attack senarios, the available JavaScript applications can be used by an adversary contain and are far more than those from benchmarks. Therefore, an adversary can apply the proposed scheme to inject gadgets for constructing ROP code to conduct arbitrary computation.
CHEN Shu , YE Jun-Min , LIU Tong
2020, 31(2):266-281. DOI: 10.13328/j.cnki.jos.005632
Abstract:Software defect prediction aims at the very early step of software quality control, helps software engineers focus their attention on defect-prone parts during verification process. Cross-project defect predictions are proposed in which prediction models are trained by using sufficient training data from already existed software projects and predict defect in some other projects, however, their performances are always poor. The main reason is that, the divergence of the data distribution among different software projects causes a dramatic impact on the prediction accuracy. This study proposed an approach of cross-project defect prediction by applying a supervised domain adaptation based on instance weighting. The sufficient instances drawn from some source project are weighted by assigning target-dependent weights to the loss function of the prediction model when minimizing the expected loss over the distribution of source data, so that the distribution properties of the data from target project can be matched to the source project. Experiments including dataset selection, data preprocessing and results are described over different experiment strategies on ten open-source software projects. Over fitting problems are also studied through different levels including dataset, prediction model and domain adaptation process. The results show that the proposed approach is close to the performance of within-project defect prediction, better than similar approach and significantly better that of the baseline.
XIANG Yi , ZHOU Yu-Ren , CAI Shao-Wei
2020, 31(2):282-301. DOI: 10.13328/j.cnki.jos.005637
Abstract:In search-based software engineering, one of the active research topics is the many-objective optimal software selection from software product lines. Previous works in this area mainly dealt with the preference from software engineers and end users in a posteriori way (namely selection after search). Different from that, this study integrates users' preference into the optimization process and proposes a new algorithm that can conduct a directed search for software products in which users are most interested. In the new algorithm, the users' preference is expressed as a weight vector, and an achievement scalarizing function (ASF) is used to aggregate all the optimization objectives. In addition, a new relation is defined to compare different individuals. To enhance the ability of the algorithm when searching for valid solutions, a substitution operator and a repair operator are implemented by using DPLL/CDCL-type and SLS-type SAT solvers, respectively. To verify the effectiveness of the new algorithm, 21 feature models widely used before are adopted to conduct simulation experiments. In these models, the largest number of features and that of constraints are 62 482 and 343 944, respectively. As shown by the experimental results, the substitution operator, which is based on a DPLL/CDCL-type SAT solver, is beneficial for returning valid software products, while the repair operator implemented by an SLS-type SAT solver contributes to find products that can satisfy the preference of users as much as possible. In summary, the simultaneous use of two types of SAT solvers is a feasible and effective way to handle the many-objective optimal software product selection problem where users' preference should be considered.
CAI Li , WANG Shu-Ting , LIU Jun-Hui , ZHU Yang-Yong
2020, 31(2):302-320. DOI: 10.13328/j.cnki.jos.005977
Abstract:Data annotation is a key part of the effective operation of most artificial intelligence algorithms. The better the annotation accuracy and quantity, the better the performance of the algorithm. The development of the data annotation industry boosts employment in many cities and towns in China, prompting China to gradually become the center of world data annotation. This study summarizes its development, including origin, application scenarios, classifications, and tasks; lists the commonly used annotation data sets, open source data annotation tools and commercial annotation platforms; proposes the data annotation specification including roles, standards, and processes; gives an example of data annotation in a sentiment analysis. Then, this paper describes the models and characteristics of state-of-the-art algorithms for evaluating annotation results, and compares their advantages and disadvantages. Finally, this paper prospects research focuses and development trends of data annotation from four aspects:tasks, tools, annotation quality, and security.
CHU Xiao-Min , XI Xue-Feng , JIANG Feng , XU Sheng , ZHU Qiao-Ming , ZHOU Guo-Dong
2020, 31(2):321-343. DOI: 10.13328/j.cnki.jos.005868
Abstract:Discourse structure analysis is an important research topic in natural language processing. Discourse structure analysis not only helps to understand the discourse structure and semantics, but also provides strong support for deep applications of natural language processing, such as automatic summarization, information extraction, question answering, etc. At present, the analysis of discourse structure is mainly concentrated on the micro level. The analysis focuses on the relations and structures between sentences or sentences groups, while the analysis on macro level is less. Therefore, this study takes discourse structure as the research object, and focuses on the construction of representation schema and corpus resources on the macro level. This study discusses the importance of discourse structure analysis, expounds the research status of discourse structure analysis from three aspects, namely, theory system, corpora resource, and computing model, and puts forward the macro-micro unified discourse structure representation framework with the primary-secondary relation as the carrier. Furthermore, this study constructs the logical semantic structure and functional pragmatic structure of macro discourse level respectively. On this basis, this study annotates a macro Chinese discourse structure corpus, consisting of 720 newswire articles, and analyzes the results of the annotations in consistency and statistical data.
ZHAO Xiao-Fei , SHI Zhong-Zhi , TIAN Dong-Ping , LIU Jian-Wei
2020, 31(2):344-355. DOI: 10.13328/j.cnki.jos.005614
Abstract:The reasoning performed to verify the correctness of the RDFS ontologies is the task with high computational cost. The task will become more complex in the scenario of existence of additional constraints. This paper presents an approach that can extract the RDFS schema without changing the reasoning results. This approach is based on the analysis of the dependency relationships between the constraints. To capture the precise semantics of the RDFS schema, firstly, the schema elements and the constraints are formalized into first-order formulas expressed as disjunctive embedded dependencies and the constraint depandency graph is established according to the interaction between the constraints. Then, the strategies for deleting the edges and the nodes that are irrelevant to the reasoning tasks are applied. Finally, the RDFS sub-schema is obtained through the reconstruction process. The proposed approach enables the reasoning to be carried out on the extracted small-scale ontologies. The experiment results show that the proposed approach can significantly improve the efficiency of the RDFS ontology validation. Compared to the reasoning time, the average time (0.60s) consumed by the extraction process is almost negligible, while the efficiency increasement ranges from 2.00 times to 22.97 times.
XIE Cheng-Wang , YU Wei-Wei , BI Ying-Zhou , WANG Shen-Wen , HU Yu-Rong
2020, 31(2):356-373. DOI: 10.13328/j.cnki.jos.005617
Abstract:In real-world, there exist lots of many-objective optimization problems (MaOPs), which severely challenge well-known multi-objective evolutioanry algorithms (MOEAs). A many-obective evolutioanry algorithm combining decomposition and coevolution (MaOEA/DCE) is presented in this paper. MaOEA/DCE adopts mix-level orthogonal experimental design to produce a set of weight vectors evenly distributed in weight coefficient space, so as to improve the diversity of initial population. In addition, the MaOEA/DCE integrates differential evolution (DE) with the adaptive SBX operator to generate high-quality offspring for enhancing the convergence of evolutionary population. Some comparative experiments are conducted among MaOEA/DCE and other five representative MOEAs to examine their IGD+ performance on four MaOPs of DTLZ{1,2,4,5}. The experimental results show that the proposed MaOEA/DCE has overall performance advantage over the other peering MOEAs in terms of convergence, diversity, and robustness.
HUO Xing , ZHANG Yang-Yang , JING Yong-Jun , SHAO Kun
2020, 31(2):374-394. DOI: 10.13328/j.cnki.jos.005619
Abstract:In MAS with distributed architecture, Agents are employed to achieve task objectives through mutual coordination and collaboration. Since there is no centralized management authority to rely on, it is difficult to judge the reputation information of Agents. Beyond that, traditional reputation evaluation methods based on evaluation and feedback have some problems, such as the insufficient usage of feedback evaluation and the absence of feasible mechanism for credible feedback evaluation information, etc. To solve these problems, a comprehensive reputation calculation method is proposed in this study. Concerning about malicious evaluation submitted by individual users, the proposed algorithm first filters the service evaluation data by the CUSUM (cumulative sum) control chart theory, then integrates different dimension of evaluation data using information entropy method, after that, uses PageRank algorithm to measure the influence of individual. Finally, it gives the comprehensive reputation model incorporating feedback evaluation of truth value and individual influence. Simulation results show that the proposed method performs well in improving reputation computation accuracy and convergence as well as the resisting of malicious attacks.
HE Fu-Lin , LIU Lei , Lü Shuai , NIU Dang-Dang , WANG Qiang
2020, 31(2):395-405. DOI: 10.13328/j.cnki.jos.005642
Abstract:Model counting is the problem of calculating the number of the models of a given propositional formula, and it is a generalization of the SAT problem. Model counting is widely used in the field of artificial intelligence, and many practical problems can be reduced to model counting. At present, there are two complete solvers which are commonly used for model counting, i.e. Cachet and sharpSAT, both of which have high solving efficiency. But their solving efficiency does not relate to the numbers of the models. It is reasonable to suppose that incomplete methods are more likely to take their advantage in efficiency and maybe they could be more suitable to solve model counting problems when the number of the models of given propositional formula is little. Local search is an efficient incomplete method for solving SAT problem. A new strategy called configuration checking (CC) has been proposed by Cai et al. and it is adopted to the local search. In this way, the SWcc algorithm has been proposed with high solving efficiency. This study puts forward two incomplete methods on basis of the SWcc algorithm, i.e. the iteration method and the improved incremental method, both of which have high efficiency. Then, the specific implementation process of the two methods is presented. At last, the experimental results of a large amount of benchmarks, according to which is found, show the advantages of the iteration method and the improved incremental method in terms of time, when the amount of the models of given conjunctive normal formula is small.
ZHANG An-Zhen , LI Jian-Zhong , GAO Hong
2020, 31(2):406-420. DOI: 10.13328/j.cnki.jos.005876
Abstract:This work studies the problem of aggregate query processing over incomplete data based on denotational semantics. Incomplete data is also known as missing values and can be classified into two categories:applicable nulls and inapplicable nulls. Existing imputation algorithms cannot guarantee the accuracy of the query result after imputation. The interval estimation of the aggregate query result is given. This study extends the relational model under the denotational semantic, which can cover all types of incomplete data. A new semantic of aggregate query answers over incomplete data is defined. Reliable answers are interval estimations of the ground-truth query results, which can cover the ground-truth results with high probability. For SUM, COUNT, and AVG queries, linear approximate evaluation algorithms are proposed to compute reliable answers. The extended experiments on the real datasets and synthetic datasets verify the effectiveness of the method proposed in this study.
LAI Yi-An , ZHANG Yu-Jie , DU Yu-Lu , MENG Xiang-Wu
2020, 31(2):421-438. DOI: 10.13328/j.cnki.jos.005618
Abstract:The newly emerging event-based social network (EBSN) based on the event as the core combines the online relationship with offline activities to promote the formation of real and effective social relationship among users. However, excessive activity information would make users difficult to distinguish and choose. The context-aware local event recommendation is an effective solution for the information overload problem, but most of existing local event recommendation algorithms only learns users' preference for contextual information indirectly from statistics of historical event participation and ignores latent correlations among them, which impacts on recommendation effectiveness. To take full advantage of latent correlations between users' event preference and contextual information, the proposed collective contextual relation learning (CCRL) algorithm models relations among users' participation records and related contextual information such as event organizer, description text, venue, and starting time. Then multi-relational Bayesian personalized ranking (MRBPR) algorithm is adapted for collective contextual relation learning and local event recommendation. Experiment results on Meetup dataset demonstrate that proposed algorithm outperforms state-of-the-art local event recommendation algorithms in terms of many metrics.
LUO Yang , SHEN Qing-Ni , WU Zhong-Hai
2020, 31(2):439-454. DOI: 10.13328/j.cnki.jos.005624
Abstract:In order to protect the cloud resources, access control mechanisms have to be established in the cloud. However, cloud platforms have tendency to design their own security policy languages and authorization mechanisms. It leads to two issues:(i) a cloud user has to learn different policy languages to customize the permissions for each cloud, and (ii) a cloud service provider has to design and implement the authorization mechanism from the beginning, which is a high development cost. In this work, a new access control policy specification language called PML is proposed to support expressing multiple access control models like BLP, RBAC, ABAC and important features like multi-tenants. An authorization framework called PML-EM is implemented on OpenStack to centralize the authorization. PML-EM is irrelative to policy languages, access control models and programming languages that implement the authorization module. Other policies like XACML policy and OpenStack policy can be automatically translated into PML, which facilitates the migration between the clouds that both support PML-EM. The experimental results indicate PML-EM has improved the flexibility of policy management from a tenant's perspective. And the performance overhead for policy evaluation is 4.8%, and the invasiveness is about 0.42%.
XIAN He-Qun , LIU Hong-Yan , ZHANG Shu-Guang , HOU Rui-Tao
2020, 31(2):455-470. DOI: 10.13328/j.cnki.jos.005628
Abstract:Data deduplication technology has been widely applied in cloud storage systems. Under the premise of ensuring data privacy, how to effectively perform deduplication in semi-trusted cloud storage environments becomes one of the primary issues in cloud computing security. Current schemes rely heavily on online trusted third parties to manage data labels and to keep track of the number of users. The trusted third party plays such a vital role in those schemes that it is indispensable even at the cost of unsatisfying efficiency and potential bottleneck. A verifiable secure data deduplication scheme in cloud storage is proposed, which doesnot require any online trusted third party. The dual-tag scheme based on bilinear mapping is adopted to conduct popularity check. The tag is used to retrieve files without leaking any exploitable information. A modified group signature scheme is designed to prevent the cloud server from forging popularity query results. Users can verify the authenticity of query results from the cloud server. The multi-layered cryptosystem is adopted in the proposed scheme, in which different encryption strategies are applied according to the popularity of specific data. The correctness and security of the proposed scheme are analyzed and proved. Simulation results show that the proposed scheme is secure and efficient.
ZHOU Chang-Li , CHEN Yong-Hong , TIAN Hui , CAI Shao-Bin
2020, 31(2):471-492. DOI: 10.13328/j.cnki.jos.005679
Abstract:Location privacy and query content privacy are both critical elements in LBS querying for points of interest (POIs). For continuous queries in road networks, frequent changes of a user's location bring huge burden of query processing to LBS server, how to release a user's privacy information as little as possible, and obtain accurate query results efficiently are still great challenges in current researches. Taking the idea of private information retrieval (PIR), i.e. no trusted entities except the user himself, as a basic assumption, a privacy-preserving method is put forward based on homomorphic properties of Paillier cryptosystem, which the user does not need to provide his actual location or query content to LBS server in K nearest neighbor POIs query, it achieves privacy preservation in LBS and accurate retrieval of POIs. Meanwhile, takeing the vertexes in road networks as generating elements to organize the distribution information of POIs, the inefficient problem is further solved in most cryptographic query schemes, which is caused by frequent location changes in continuous query, the proposed method significantly reduces the frequency of initiating queries to LBS server without decreasing the query accuracy. Finally, the proposed method is analyzed from the aspects of accuracy, security, and efficiency, extensive experiments verify the effectiveness.
TAO Xin-Min , CHANG Rui , SHEN Wei , WANG Ruo-Tong , LI Chen-Xi
2020, 31(2):493-510. DOI: 10.13328/j.cnki.jos.005639
Abstract:In this study, a novel semi-supervised kernel Fisher discriminant analysis (KFDA) based on low density separation geometric distance is proposed. The method employs the low density separation geometric distance as the measure of similarity and thus improves the generalization ability of the KFDA through a large number of unlabeled samples. First, the original spatial data are implicitly mapped onto the high-dimensional feature space by kernel function. Then, both the labeled data and the unlabeled data are used to capture the consistence assumption of geometrical structure based on low density separation geometric distance, which are incorporated into the objection function of Fisher discriminant analysis as a regularization term. Finally, the optimal projection matrix is obtained by minimizing the objective function. Experiments on artificial datasets and UCI datasets show that the proposed algorithm has a significantly improvement in classification performance compared with the KFDA and its modified approaches. In addition, comparison results with other methods on face recognition problems demonstrate that the proposed algorithm has higher identification accuracy.
LIU Ming-Hua , WANG Chuan-Sheng , HU Qiang , WANG Chuan-Xu , CUI Xue-Hong
2020, 31(2):511-530. DOI: 10.13328/j.cnki.jos.005656
Abstract:A part-based tracking approach based on multi collaborative model is proposed that can address the problem of losing object based on the holistic appearance model in complex scenarios. Object appearance model is constructed by fusing the generative model based on local sensitive histogram (LSH) and discriminative model based on superpixel segmentation, by extracting the illumination invariant feature of the LSH resist the influence of the illumination changes on the object model effectively; for the lack of effective occlusion handling mechanism of the LSH algorithm, the part-based adaptive model segmentation method is introduced to improve the performance of resistance occlusion; by through the relative entropy and mean shift cluster method, measuring the differences confidence value and the foreground-background confidence value of the local part, establish the dual weights constraint mechanism and asynchronous update strategy for the part model, the partes with high confidence are selected to locate object in the particle filter framework. Experimental results on challenging sequences confirm that the proposed approach outperforms the related tracking algorithm in complex scenarios.
GU Guang-Hua , CAO Yu-Yao , LI Gang , ZHAO Yao
2020, 31(2):531-543. DOI: 10.13328/j.cnki.jos.005630
Abstract:The popularity of smart electronic devices and the Internet makes the image data explode. In order to effectively manage the complex image resources, this study proposed an image hierarchical classification method based on a weighted semantic neighborhood set and formal concept partial order structure. Firstly, a weighted coefficient on different semantics is adaptively designed by the image semantic correlation scores, and a weighted semantic neighborhood (WSN) is constructed from the training sets. The semantic labels of the images are automatically generated by judging the word frequency distribution of the images in the semantic neighborhood set. Then, the context is built by taking the images as the objects and the semantic labels as the attributes. This study also proposed an efficient hierarchical classification method for complex image dataset based on the partial order structure. The hierarchical classification method can get the explicit structure relation and the subordinate relationship between the image categories, which provides an effective idea for the hierarchical classification management of the complex images of large data. Three datasets Corel5k, EspGame, and Iaprtc12 were labeled by the WSN method. The label result proved the integrity of the image semantics and the accuracy of the main semantics. Further, the Corel5k dataset was performed by the hierarchical classification method. The experimental results showed the significant performance of the hierarchical classification.
WU Xiao-Hui , HE Ye-Ping , MA Heng-Tai , ZHOU Qi-Ming , LIN Shao-Feng
2020, 31(2):544-563. DOI: 10.13328/j.cnki.jos.005979
Abstract:Modern processor optimizations including out-of-order execution and speculative mechanism are critical for performance. Recent research such as Meltdown and Spectre exploit delayed exception handling and misprediction events to trigger transient execution, and leak otherwise inaccessible information via the CPU's microarchitectural state from instructions which are never committed. This type of attacks is called transient execution attack, which is different from the traditional cache side channel attack. It is more difficult to mitigate. This work deeply studied the mechanism and implementation of transient execution attacks, and summarized its research status and defenses. First, the optimizations adopted by the processor are introduced, and the essential causes of the transient execution attack are analyzed. Then, the transient execution attacks are generalized systematically based on the primitives that trigger the transient execution, in order to reveal the attack surface. Finally, the defenses are summarized in the view of the key steps and components in the attack model, and the future research directions of this field are presented.
ZHANG Jie-Xin , PANG Jian-Min , ZHANG Zheng
2020, 31(2):564-577. DOI: 10.13328/j.cnki.jos.005615
Abstract:The Web server with mimic construction is a new Web security defense system based on the principle of mimic defence. It uses the heterogeneity, dynamics, redundancy, and other characteristics to block or disrupt network attacks to control the security risk of the system. This study analyzes how heterogeneity can improve the security of the Web server with mimic construction and points out the importance of quantification of heterogeneity. Based on the quantification methods of biodiversity, this study defines the heterogeneity of the Web servers with mimic construction as the complexity and disparity of its execution set, proposes a quantification method that is suitable for quantitative heterogeneity, and analyzes the factors that influence heterogeneity of the Web servers with mimic construction. This study provides a new method for quantitative assessment of mimic defence in theory, and provides guidance for choosing the redundancy, components, and execution in practice. The experimental results show that the proposed method is more suitable for quantifying the heterogeneity of Web server with mimic construction than the Shannon-Wiener index and Simpson index.
HAN Jing , LI Yan-Ping , YU Yong , DING Yong
2020, 31(2):578-596. DOI: 10.13328/j.cnki.jos.005633
Abstract:With the advent of cloud storage, more and more users choose to store large amounts of data on the remote cloud server in order to save local storage resources. In recent years, how to verify the integrity of remote stored data in the cloud has been become a hotspot in academia. Although many cloud auditing protocols have been put forward, most of them are based on the assumption that users (individuals or enterprises) and their public/private keys remain constant in the whole process of using cloud storage system, and these schemes cannot dynamically update data in real time. Therefore, this study proposes a lightweight cloud auditing scheme which supports dynamic revocation of users and real-time updating of data. First of all, this scheme allows users to revoke dynamically and efficiently (including the updating of public private keys), multi-use unidirectional proxy re-signature technology is adopted in the stage of revocation, that is, a new user simply needs to calculate the re-signature key instead of downloading data from the cloud to re-sign and then uploading it to the cloud. Secondly, this scheme can realize the data dynamic updating (inserting, deleting, and modifying) in real time by introducing the virtual index into the identification code of data block. Consequently, only the identification code of updated data block changes while the other's remain unchanged when dynamically updating data. Finally, in the stage of re-signature, the cloud server is able to represent a new user to re-sign, and in the stage of auditing, third party audit center can represent the current user to verify the integrity of data in the cloud, which greatly reduce the computational overhead of user and communication overhead of system (lightweight). The security and performance analyses of this study further show that the proposed scheme is secure and efficient.