YU Dong-Hua , GUO Mao-Zu , LIU Yang , REN Shi-Jun , LIU Xiao-Yan , LIU Guo-Jun
2017, 28(12):3115-3128. DOI: 10.13328/j.cnki.jos.005237
Abstract:This paper presents a research on speeding up K-medoids clustering algorithm. Firstly, two acceleration lemmas are given based on partitioning around medoids(PAM) and triangular inequality elimination criteria PAM(TPAM) algorithms. Then two new acceleration theorems are proposed based on distance inequality between center points. Combining the lemmas with the theorems with the aid of additional memory space O(n+K2), the speed up partitioning around medoids(SPAM) algorithm is constructed, reducing the time complexity from O(K(n-K)2) to O((n-K)2). Experimental results on both real-world and artificial datasets show that the SPAM algorithm outperforms PAM, TPAM and FKEMDOIDS approaches by at least 0.828 times over PAM in terms of running time.
ZHANG Tao , BAI Dong-Hui , LI Hui
2017, 28(12):3129-3145. DOI: 10.13328/j.cnki.jos.005238
Abstract:With the arrival of parallel computing era, parallel computing of formal concepts has become a hot issue in the field of formal concept analysis. This paper proposes a parallel concept computing algorithm by means of the graph characteristics of an attribute topology used in representing formal context. First, according to the parent relations, the bottom-up decomposition of attribute topology is conducted to generate sub-topologies. Then, concept-couplings among sub-topologies are removed based on the correlations in attribute-pairs in order to ensure the independence of the sub-topologies when carrying out concept computing and then to avoid large time consumption of the merging operation in later stage. Finally, all the concepts without repetition can be calculated by accumulating directly all the concept-sets computed in different sub-topologies. The experiment shows that the approach proposed in this paper can not only obtain all the concepts without repetition, but also improve the computational efficiency in accordance with the hardware platform and reduce the time required for the concept calculation.
2017, 28(12):3146-3155. DOI: 10.13328/j.cnki.jos.005240
Abstract:Twin parametric insensitive support vector regression(TPISVR) is a novel machine learning method proposed. Compared to other regression methods, TPISVR has unique advantages in dealing with heteroscedastic noise. Standard TPISVR can be attributed to solve a pair of quadratic programming problem(QPP) with inequality constraints in the dual space. However, this method is subject to the constraints of time and memory when number of samples are large. This paper introduces the least squares ideas, and proposes the least squares twin parametric insensitive support vector regression(LSTPISVR) which transforms the two QPPs of TPISVR into linear equations and solves them directly on the original space. Further, a chaotic cuckoo optimization algorithm is introduced for parameter selection of LSTPISVR. Experiments on artificial datasets and UCI datasets show that LSTPISVR not only has fast learning speed, but also shows good generalization performance.
YANG Ming-Qi , LI Zhan-Shan , LI Zhe
2017, 28(12):3156-3166. DOI: 10.13328/j.cnki.jos.005242
Abstract:Generalized arc consistency(GAC) is the most widely studied consistency for solving constraint satisfaction problem(CSP). MDDc, STR2 and STR3 are widely used algorithms for enforcing GAC on table constraints, where MDDc represents constraints in a compression method which converts a table constraint into a multi-valued diagram(MDD). When a MDD has many identical parts, the compression ratio is high and MDDc outperforms others. STR3, similar to STR2, dynamically reduces the table size by maintaining a table of only valid tuples. When valid tuples decrease slowly, STR3 outperforms STR2. Considering that finding valid outgoing edges of MDD nodes and checking and deleting invalid tuples in dual table are the most time-consuming operations in MDDc and STR3, this study proposes AdaptiveMDDc and AdaptiveSTR respectively for MDDc and STR3 to perform the above two operations in an adaptive way so that the lowest cost strategy in different backtrack search stages is always employed. Benefiting from lower cost of strategy and significant performance difference of each strategy during different backtrack search stages, AdaptiveMDDc and AdaptiveSTR have better performance compared to the original methods, and ApaptiveSTR is over three times faster than STR3 in some instances.
SU Chang , WANG Xiao-Mei , HUANG Shu-Man , CHEN Yi-Jiang
2017, 28(12):3167-3182. DOI: 10.13328/j.cnki.jos.005247
Abstract:Metaphor comprehension has become an important issue of linguistics, cognitive science and computer science. It is also an unavoidable task of natural language processing. This paper presents a novel metaphor comprehension method to make full use of global information based on relevance constraints. The method uses implied perspective to calculate the relevance degree between the target and source domains. First, multi-level semantic representation is obtained based on the semantic representation of word, topic features of word and topic features of discourse. Next, the degree of relevance relations is calculated and the relevance model is generated. Additionally, relevance relations is used to connect cross-level nodes from different perspectives. Then, using random walk algorithm, the relevance relations are acquired from latent perspectives through iterative computations. Finally, the target attribute that has the maximum relevance degree with the target domain is selected as the comprehension result. Experimental results show that the presented method is effective in metaphor comprehension.
LI Yong-Gan , ZHOU Xue-Guang , SUN Yan , ZHANG Huan-Guo
2017, 28(12):3183-3205. DOI: 10.13328/j.cnki.jos.005283
Abstract:This paper studies sentiment analysis in Weibo. The study focuses on three types of tasks:emotion sentence identification and classification, emotion tendency classification, and emotion expression extraction. An unsupervised topic sentiment model, UTSM, is proposed based on the LDA Collocation model to facilitate automatic hashtag labeling. A Gibbs sampling implementation is presented for deriving an algorithm that can be used to automatically categorize emotion tendency with computer. To address the issue of lower recall ratio for emotion expression extraction in Weibo, dependency parsing is used to divide dependency model into two categories with subject and object. Six dependency models are also constructed from evaluation objects and emotion words, and a merging algorithm is proposed to accurately extract emotion expression. Result of experiments indicates that the presented method has a strong innovative and practical value.
XU Dan , LI Wei , WANG An-Wen , FAN Hao-Nan , GONG Xiao-Qing , CHEN Xiao-Jiang , FANG Ding-Yi
2017, 28(12):3206-3222. DOI: 10.13328/j.cnki.jos.005248
Abstract:Data collection is the most crucial problem of wireless monitoring networks. The UAV based data collection methods have become the trend, as they can reduce the relaying times of data from the source to the sink, and improve the efficiency of network energy by replacing traditional self-organized transmission nodes with UAV. Current UAV based data collection schemes, however, focus on how to maximize the quantity of data using limited energy without consideration of data value, and hence perform poorly in energy efficiency of UAV. The challenge for achieving maximum data value with minimum UAV energy consumption is to measure the value of the data, as the value of data is subjective evaluation of applications, and there is no uniform measurement to compare the value of data that collected by different nodes. This paper introduces the first data value based data collection method OnValueGet that collect the most valuable data under the energy constraint. The intuition underlying the design is that nodes with similar data experience similar data value. The paper defines and selects the most valuable nodes(called data-critical nodes) by analyzing and comparing the temporal and spatial similarity of data. The data sensed by data-critical nodes can approximately represent all nodes' sensing data within a certain error. Aiming to collect the data of these data-critical nodes, the paper then adapts greedy algorithm to programing the route of UAV with the limited energy, and significantly improves the energy efficiency.
WANG Chang-Ping , WANG Chao-Kun , WANG Hao , WANG Meng , CHEN Jun
2017, 28(12):3223-3240. DOI: 10.13328/j.cnki.jos.005244
Abstract:Similarity join is one of the hottest topics in the field of data management, and it has been widely applied in many fields. However, existing similarity join methods cannot meet the increasing demands in the real world. This paper define generalized bisimilarity join as a new similarity join to expend the applications of the similarity join research by introducing the satisfaction operator on various data types with individual thresholds. Two efficient methods, SJS(sub-join set) and MFV(mapping-filtering-verification), are proposed to solve this problem. A large amount of experiments conducted on both real-world and synthetic datasets demonstrate the correctness and the effectiveness of the proposed methods.
2017, 28(12):3241-3256. DOI: 10.13328/j.cnki.jos.005285
Abstract:It is desirable for a user to get high-quality query results from only a few data sources in deep Web data integration systems. Therefore, data source selection becomes one of the core technologies in the integration systems. In this paper, a method based on correlations and diversities is proposed for selecting deep Web data sources suitable for small-scale sampling document summaries. Firstly, considering the correlations between the query and the data sources, a hierarchical subject summary with a probability model of correlation deviation of the data sources is constructed to discriminate the data sources. Furthermore, a method is described for constructing a deviation probability model based on artificial feedbacks and correlation measurement of the data sources. Meanwhile, the diversity-oriented directed edges are built in the hierarchical subject summary of data source in consideration of the diversities of data sources, and an evaluation metric is proposed to measure data source diversities. Taking the data source selection based on correlation and diversity as a combinatorial optimization problem, an optimal result of data source selection is achieved by solving an optimization function. Experimental results show that the proposed method achieves better selection accuracy in selecting data sources with small sampling documents.
HAN Zhe , ZHANG Xia , LI Ou , ZHANG Ce , ZHANG Da-Long
2017, 28(12):3257-3273. DOI: 10.13328/j.cnki.jos.005246
Abstract:Data gathering algorithm based on compressive sensing(CS) has enormous application potential in wireless sensor network(WSN) in which there is limited energy and a lot of redundant data. However, most existing studies assume that network is based on ideal link. This paper illustrates a situation by experiment that existing CS reconstruction quality will be seriously affected by lossy link, and proposes a CS data gathering algorithm based on retransmission and time series correlation prediction(CS-RTSC). The type of packet loss is modeled as element random loss(ERL) and block random loss(BRL). The loss type prediction algorithm based on sliding window statistics is designed to determine the type of packet loss when link packet loss occurs. Retransmission recovery is applied for ERL, and time series correlation prediction algorithm is designed to recover the loss for BRL. The simulation result indicates that the proposed algorithm can effectively reduce the impact of lossy link in CS data gathering. When the packet loss ratio is up to 30%, the relative error of CS reconstruction signal is only 0.1% higher than that of the CS reconstruction signal in the ideal link.
GONG Lin-Ming , LI Shun-Dong , DOU Jia-Wei , GUO YI-Min , WANG Dao-Shun
2017, 28(12):3274-3292. DOI: 10.13328/j.cnki.jos.005239
Abstract:In recent years, secure multiparty computation is one of research focuses in the field of cryptography, and secret geometry calculation is an important branch of it. The problem of safely calculating a straight line by two private coordinate point has important application prospect in space information security. First, a variant of Paillier's homomorphic encryption scheme is put forward in that the base is calculated by sender during encryption, and its indistinguishability under adaptive chosen-plaintext attack is proved. Then, based on this homomorphic encryption scheme, a protocol that can safely calculate a straight line by two private coordinate point in semi-honesty model is designed. Moreover, this protocol can be applied to solve a type of secure multiparty computational geometry problem that can be reduced to compute coordinate difference quotient. Thus, the problem that there is a non-negligible probability of private information leakage in the current coordinate difference quotient calculation protocols based on homomorphic encryption is solved.
2017, 28(12):3293-3305. DOI: 10.13328/j.cnki.jos.005314
Abstract:Radon transform is a useful mathematical tool for shape analysis. It is a lossless transform and makes the extraction of structural shape features become very easy. However it cannot be directly applied to shape recognition due to its sensitivity to translation, scaling and rotation of the shape. The existing Radon transform based methods have had many attempts to remove the information of size, position and orientation of the shape from the Radon transform. However in these methods, the invariant features are achieved at the expense of useful shape features. To address this issue, a novel mathematical tool termed λ-transform is proposed for shape description.The λ-transform utilizes the relative position information between the parallel lines(encoded in a variable r∈[0,1]) and their integrals over the shape to construct a 2D function of the variable r and the direction angle θ of the line for shape description. This study theoretically proves that λ-transform is invariant to the translation and scaling and a rotation only makes it shift along the θ direction. It also theoretically concludes that λ-transform can effectively preserve the useful information of Radon transform. These desirable characteristic make λ-transform outperform the other Radon based methods for shape recognition. Tests on the proposed λ-transform are carried out on several commonly used shape image datasets, and the experimental results indicate it achieves better performance over other Radon based shape descriptors.
OU-YANG Xian-Bin , SHAO Li-Ping , LE Zhi-Fang
2017, 28(12):3306-3346. DOI: 10.13328/j.cnki.jos.005234
Abstract:In conventional meaningful image sharing schemes, there exist some shortages such as low authentication ability, weak to no repair or recovery ability after being attacked, and low visual quality in embedded carriers. To address these problems, this paper proposes a Gloise field self-recovery image sharing scheme with non-equivalent backup and double authentication. The proposed scheme includes both distributing and recovery stages. The distributing stage include three steps. First, the LL subband of the secret image is obtained by one layer DWT and then the LL subband coefficients is scrambled by a secret key to generate the non-equivalent backup image according to different bit important degree groups where both the backup image and the secret image are in same size. Second, each pixel in both the secret image and the backup image are shared with their 7K-13 authentication bits into 7 bits distributed information on GF(27), and then the distributed 7 bits information with its 1 bit authentication generated by the secret key is embedded into 2×2 blocks of N distributed carriers by the optimized LSB embedding method. Finally, the secret key used to scramble LL subband and generate 1 bit authentication is also shared into N sub-keys by(K,N)-threshold scheme, and then the sub-keys' MD5 values are published into the third reliable party and theNsub-keys and embedded carriers are distributed toNparticipants. The recovery stage has four steps. First, the facticity of participants' sub-keys is verified, and then the sub-keys are used to recovery the secret key. Second, the embedded distributed information with its 1 bit authentication is checked by the first authentication, the checked distributed information is used to reconstruct GF(27) interpolation polynomial, and then the pixels in the secret image and the backup image with their 7K-13 authentication bits are recovered to construct the preliminary secret image, backup image and authentication image. Third, the backup image and authentication image are used to reconstruct the LL subband of the secret image and then to build the repair reference image by the descramble method and inverse DWT. Finally, the unpassed secret image pixels in the authentication image are self-recovered by the polynomial interpolation fitting and pixel replacement in repair reference image. The theory and experiment show the proposed method has a superb authentication ability and can make full use of double authentication ability and the adjoin pixels to improve its self-recovery ability, and the embedded carriers have better visual qualities.
2017, 28(12):3347-3357. DOI: 10.13328/j.cnki.jos.005235
Abstract:In order to solve the problem of over-sparsity for within-class coefficients and over-density for between-class coefficients in SSC and LSR, this paper proposes a new subspace clustering based on Euclidean distance using A2 norm. Using the weighted method based on Euclidean distance, the coefficient representation obtained by this algorithm maintains the connections of the data points from the same subspace. Meanwhile, the algorithm can eliminate the connections between clusters. The clusters can be produced by using the spectral clustering with the similarity matrix which is constructed by this coefficient representation. The results of experiments indicate the presented method improves the accuracy of clustering.
YANG Sheng-Yuan , CHEN Yao , YI Fei , LIU Xin
2017, 28(12):3358-3366. DOI: 10.13328/j.cnki.jos.005243
Abstract:As 3D data scanning and rapid prototyping manufacturing standard in fact, STL(stereo lithography) is widely used in entertainment, manufacturing, Internet and other fields. Along with the 3D model is more and more complex, the data quantity of the 3D model is more and more large. It is difficult to get the complete topological relations quickly from the STL file, and it exists a large amount of redundant information in STL files, the two defects restrict the further optimization of processing and application of the STL mesh model. For these reasons, it is need to reconstruct the mesh of STL model. Based on 2-dimensional manifold model of STL triangular surface mesh, a fast mesh reconstruction method is proposed in this paper. Mainly using the saturated vertex deletion in the reconstruction process, in order to reduce the number of vertices which needed to be compared, and combined with the correlation of STL file data to improve the efficiency of vertex search and comparison. For a non-closed surface mesh, the algorithm to improve the efficiency of surface mesh reconstruction at the same time, also can effectively extract the boundary information of the surface mesh model. In addition, the reconstruction of the surface mesh data file is greatly reduces the storage space, and is effectively reduces the redundant data. Experimental results show that the efficiency and robustness of the algorithm in this paper.
CAI Hua-Qian , ZHANG Ying , HUANG Gang , MEI Hong
2017, 28(12):3367-3384. DOI: 10.13328/j.cnki.jos.005325
Abstract:Mobile devices with 3G/4G networking often waste energy in the so-called "tail time" during which the radio is kept on even though no communication is occurring. Prior work has developed policies to reduce this energy waste by batching network requests. However, it is challenging to apply such policies to existing apps in practice due to lack of mechanisms. This paper proposes an automatic program transformation approach for scheduling network requests in Android apps. The core of the approach is bytecode transformation for existing Android apps. Addressing the technical challenges in automatic transformation, the paper implements a transformation system named DelayDroid. Comparing to previous work, DelayDroid has two major characteristics. First, transformation is carried out automatically. Second, DelayDroid is designed to be a practicable tool, as it can transform Android apps with only dex bytecode.
WANG Bin , WU Ya-Jing , YANG Xiao-Long , SUN Qi-Fu
2017, 28(12):3385-3398. DOI: 10.13328/j.cnki.jos.005236
Abstract:Currently, there is lack of consideration of dependencies between data in big data processing, resulting in low data processing efficiency with large amounts of data transfer during task execution. In order to reduce data transfer and improve processing performance, this paper proposes a data-dependency driven task scheduling scheme, named D3S2, for big data processing. D3S2 is mainly composed of two parts:dependency-aware placement mechanism(DAPM), and transfer-aware task scheduling mechanism(TASM). DAPM discovers dependency between data so that strongly related data will be clustered and assigned to nodes in the same rack, thereby reducing the cross-rack data migration. TASM schedules tasks simultaneously after data placement according to the data locality constraint, so as to minimize the data transfer cost during the task execution. DAPM and TASM provide basis for decision making to each other, iterating constantly to adjust the scheduling scheme with the goal of minimizing the execution cost until an optimal solution is reached. The proposed scheme is verified in Hadoop environment. Experiments show that compared to native Hadoop, D3S2 reduces the data transfer during job execution, and shortens job running time.