ZhAO Yao , LI Bo , HUANG Xian-Sheng , WEN Ji-Rong , JIANG Gang-Yi , CHANG Dong-Xia
2018, 29(4):897-899. DOI: 10.13328/j.cnki.jos.005417 CSTR:
Abstract:
YANG Wen-Han , LIU Jia-Ying , XIA Si-Feng , GUO Zong-Ming
2018, 29(4):900-913. DOI: 10.13328/j.cnki.jos.005403 CSTR:
Abstract:Single-Image super-resolution reconstruction is undercut by the problem of ambiguity. For a given low-resolution (LR) patch, there are several corresponding high-resolution (HR) patches. Learning-Based approaches suffer from this hindrance and are only capable of learning the inverse mapping from the LR patch to the mean of these HR patches, resulting in visually blurred result. In order to alleviate the high frequency loss caused by ambiguity, this paper presents a deep network for image super-resolution utilizing the online retrieved data to compensate high-frequency details. This method constructs a deep network to predict the HR reconstruction through three paths:A bypass connection directly inputting the LR image to the last layer of the network; an internal high-frequency information inference path regressing the HR images based on the input LR image, to reconstruct the main structure of the HR images; and another external high-frequency information compensation path enhancing the results of internal inference based on the online retrieved similar images. In the second path, to effectively extract the high-frequency details adaptively for the reconstruction of the internal inference, the high-frequency details are transferred under the constraints measured by hierarchical features. Compared with previous conventional cloud-based image super-resolution methods, the proposed method is end-to-end trainable. Thus, after training on a large dataset, the proposed method is capable of modeling internal inference and external compensation, and making a good trade-off between these two terms to obtain the best reconstruction result. The experimental results on image super-resolution demonstrate the superiority of the proposed method to not only conventional data-driven image super-resolution methods but also recently proposed deep learning approaches in both subjective and objective evaluations.
HUYAN Kang , FAN Xin , YU Le-Tian , LUO Zhong-Xuan
2018, 29(4):914-925. DOI: 10.13328/j.cnki.jos.005405 CSTR:
Abstract:Facial image super-resolution (SR) generates a high-resolution (HR) facial image from a low-resolution (LR) facial image. Compared with natural images, facial images are so highly structured that local patches at similar locations across different faces share similar textures. In this paper, a novel graph-based neural network (GNN) regression is proposed to leverage this local structural information for facial image SR. Firstly, the grid representation of an input face image is converted into its corresponding graph representation, and then a shallow neural network is trained for each vertex in the graph in order to regress the SR image. Compared with its grid-based counterpart, the graph representation combines both coordinate affinity and textural similarity. Additionally, the NN weights of a vertex are initialized with those converged ones from its neighbors, resulting fast convergence for training and accurate regression. Experimental comparison with the state-of-the-art SR algorithms including those based on deep convolutional neural networks (DCNN) on two benchmark face sets demonstrate the effectiveness of the proposed method in terms of both qualitative inspections and quantitative metrics. The proposed GNN is not only able to deal with facial SR, but also has the potential to apply to data examples with any irregular topology structure.
PENG Ya-Li , ZHANG Lu , ZHANG Yu , LIU Shi-Gang , GUO Min
2018, 29(4):926-934. DOI: 10.13328/j.cnki.jos.005407 CSTR:
Abstract:Image super resolution is a research hot spot in the field of low level vision. The existing methods based on convolutional neural network do not optimize the image super resolution as a regression problem. These methods are weak in learning ability and require too much time in training step, also leaving room for improvement in the quality of image reconstruction. To solve above mentioned problems, this article proposes a method based on deep deconvolution neural network, which first upsamples low resolution image by deconvolution layer, and then uses deep mapping to eliminate the noise and artifacts caused by deconvolution layer. The residual learning reduces the network complexity and avoids the network degradation caused by the depth network. In Set 5, Set 14 and other datasets, the presented method performs better than FSRCNN in PSNR, SSIM, IFC and visual.
CHEN Zhao-Wei , CHANG Dong-Xia
2018, 29(4):935-944. DOI: 10.13328/j.cnki.jos.005415 CSTR:
Abstract:As an unsupervised learning technology, clustering has been widely used in practice. However, some mainstream algorithms still have problems such as incomplete noise removal and inaccurate clustering results for the datasets with noise. In this paper, an automatic clustering algorithm based on density difference (CDD) is proposed to realize automatic classification of the datasets containing the noise. The algorithm is based on the density difference between noise data and useful data to achieve removing noise and data classification. Moreover, the useful data are classified into different classes through the neighborhood construction procedure. Experimental results demonstrate that the CDD algorithm has high performance.
YANG Xu , ZHU Zhen-Feng , XU Mei-Xiang , ZHANG Xing-Xing
2018, 29(4):945-956. DOI: 10.13328/j.cnki.jos.005416 CSTR:
Abstract:With the rapid development of information technology, massive amounts of multi-view data are constantly emerging in people's daily life. To cope with such situation, multi-view learning has received much attention in the field of machine learning to promote the ability of data understanding. However, due to the difficulties such as high cost and equipment failure in multi-view data collection, part or all of observed values from one view can't be available, which prevents some traditional multi-view learning algorithms from working effectively as expected. This paper focuses on the missing view completion for multi-view data and proposes a view compatibility based completion method. For each class of multi-view data, a corresponding shared subspace is built by means of supervised learning. With the multiple shared subspaces, a view compatibility discrimination model is developed. Meanwhile, assuming that the reconstruction error of each of view of multi-view data in the shared subspace takes the independent identical distribution, an approach is put forward to seek the shared representation of multi-view data with missing view. Thus, the preliminary completion of missing view can be performed. In addition, the multiple linear regression technique is implemented to obtain a more accurate completion. Furthermore, the proposed missing view completion method is enhanced to deal with the case of the denoising of noise-polluted multi-view data. The experimental results on some datasets including UCI and Coil-20 have demonstrated the effectiveness of the proposed missing view completion method for multi-view data.
2018, 29(4):957-972. DOI: 10.13328/j.cnki.jos.005406 CSTR:
Abstract:Homomorphic encryption technique can be used for protection of data privacy, and some algebraic operations can be implemented on the ciphertext data. This is very useful in the field of cloud computing security, such as analyzing and processing the encrypted data in cloud without exposing the content of data. Addressing privacy protection and data security problems in cloud computing, this paper proposes a robust and reversible image watermarking algorithm in homomorphic encrypted domain. The algorithm includes five aspects:(1) The original image is divided into a number of non-overlapping blocks and each pixel in a block is encrypted with Paillier cryptosystem to obtain the encrypted image; (2) The statistical values of the encrypted blocks can be retrieved in encrypted domain by employing modular multiplicative inverse (MMI) method and looking for a mapping table. After that, watermark information can be reversibly embedded into encrypted image by shifting the histogram of the statistical values with the homomorphic property of Paillier cryptosystem; (3) On the receiver side, the marked histogram of the watermarked and encrypted image can be obtained for extraction of the watermark from the marked histogram. The encrypted image can be restored by inverse operations of histogram shifting in the embedding phase; (4) The marked histogram can be obtained from the directly decrypted image. This is followed by the watermark extraction and restoration of original image; (5) The watermark can still be extracted correctly under some attacks (such as JPEG/JPEG2000 compression and additive Gaussian noise) to some extent on the watermarked and decrypted image. The proposed method achieves embedding information bits directly into the encrypted image without preprocessing operations on the original image, and can extract the watermark and restore the encrypted image in encrypted domain or the original image in plaintext domain after decryption. Besides, the watermark is robust to those common image processing operations. The experimental results have shown the validity of the proposed scheme.
ZHU Ying-Ying , CAO Lei , WANG Xu
2018, 29(4):973-986. DOI: 10.13328/j.cnki.jos.005410 CSTR:
Abstract:With the rapid development of multi-device interactive applications, the transmission and processing of screen content image (SCI) is growing every day. Image quality assessment, which is the basis of many other research topics, has mainly focused on traditional natural images so far. Image quality assessment specifically for screen content image is therefore becoming very important and urging. Considering that image quality assessment database is the basis of objective image quality assessment metrics, this paper first constructs a large scale Immersive Media Laboratory screen content image quality database (IML-SCIQD). The IML-SCIQD database contain 25 reference images and 1250 distorted images that are distorted by 10 distortions. Based on the IML-SCIQD database, the visual perception difference of pictorial region and textual region is studied. At the same time, inspired by the idea of natural scene statistics (NSS) based no reference (NR) image quality assessment metrics, a NSS based NR content image quality assessment metric (NSNRS) is proposed. The quality scores of textual region and pictorial region are first computed in the NSNRS metric. Then, the quality scores of these two regions are combined to get the quality score of the whole screen content image. For performance comparison, the proposed metric is compared with 12 state-of-the-art objective image quality assessment metrics, including full reference, reduced reference and no reference algorithms, on the IML-SCIQD database and the SIQAD database. Extensive experiments support that the proposed algorithm outperforms the existing representative no reference techniques, and that the new metric has comparable performance with those full reference metrics for the whole database.
ZHANG Yi-Wei , ZHANG Wei-Ming , YU Neng-Hai
2018, 29(4):987-1001. DOI: 10.13328/j.cnki.jos.005411 CSTR:
Abstract:Nowadays, the steganalysis of digital image mainly focuses on the design of steganalysis features to improve the universal blind detection (UBD) model's detection accuracy. However it has nothing to do with the testing images and is difficult to achieve high-precision detection. Based on large data training resources, this article studies the influence of steganography on image features to uncover the important relationship between steganalysis and image feature. Furthermore, the article proposes a steganalysis method for testing samples to select specialized training sets. The classical JPEG steganography algorithm nsF5 and the mainstream JPEG steganalysis features, such as CC-PEV, CC-Chen, CF*, DCTR and GFR, are used as an example to organize the experiments. The results show that the accuracy of this method is higher than that of other similar methods.
HU Hao-Hui , NI Rong-Rong , ZHAO Yao
2018, 29(4):1002-1016. DOI: 10.13328/j.cnki.jos.005413 CSTR:
Abstract:In order to deal with the forged image using content-aware resizing, the content-aware resizing detection algorithm based on statistic characteristics of probability map is proposed in this paper. The method uses the probability map to reflect whether the image has been processed by content-aware resizing operation. In addition the proposed integral projection and local statistic characteristics are used to detect the tampered image. Trained by classifier, this method can then identify the content-aware resizing forgery more effectively. The experimental results reveal that the proposed detection algorithm can distinguish between original images and manipulation images with high detection accuracy.
GUO Wen , YOU Si-Si , ZHANG Tian-Zhu , XU Chang-Sheng
2018, 29(4):1017-1028. DOI: 10.13328/j.cnki.jos.005402 CSTR:
Abstract:The spatio-temporal tracking (STC) algorithm can effectively track object using the structural information contained in the context around the object in real time. However the algorithm only exploits single gray object feature information in order to make the object representation discriminative. Moreover, it fails to initialize when tracking drift due to occlusion problems. Aiming at the existing weaknesses of the spatio-temporal context algorithm, a novel low-rank redetection based multiple feature fusion STC tracking algorithm is proposed in this paper. Firstly, multiple feature fusion based spatio-temporal context is extracted to construct complicated spatio-temporal context information, which improves the effectiveness of object representation by taking full advantage of the feature information around the object. Then, a simple and effective matrix decomposition method is used to give a low rank expression of the history tracking information, which can be embedded into the online detector. As a result, the uniform structure stability of the tracking algorithm is maintained to solve the relocation problem after the tracking failure. Experimental results on a series of tracking benchmark show the proposed algorithm has a better tracking precision and robustness than several stale-of-the-art methods, and it also have a good real-time performance.
BAI Cong , HUANG Ling , CHEN Jia-Nan , PAN Xiang , CHEN Sheng-Yong
2018, 29(4):1029-1038. DOI: 10.13328/j.cnki.jos.005404 CSTR:
Abstract:Features from different levels should be extracted from images for more accurate image classification. Deep learning is used more and more in large scale image classification. This paper proposes a deep learning framework based on deep convolutional neural network that can be applied for the large scale image classification. The proposed framework has modified the framework and the internal structure of the classical deep convolutional neural network AlexNet to improve the feature representation ability of the network. Furthermore, this framework has the ability of learning image features and binary hash simultaneously by introducing the hidden layer in the full-connection layer. The proposal has been validated in showing significance improvement through the serial experiments in three commonly used databases. Lastly, different effects of different optimization methods are analyzed.
DING Ming-Yu , NIU Yu-Lei , LU Zhi-Wu , WEN Ji-Rong
2018, 29(4):1039-1048. DOI: 10.13328/j.cnki.jos.005408 CSTR:
Abstract:The improvements of computing performance make deep learning possible. As one of the important research directions in the field of computer vision, object detection has combined with deep learning methods and is widely used in all walks of life. Limited by the complexity of the network and the design of the detection algorithm, the speed and precision of the object detection becomes a trade-off. At present, the rapid development of electronic commerce has produced a large number of pictures containing the product parameters. The traditional method is hard to extract the information of the product parameters in the picture. This paper presents a method of combining deep learning detection algorithm with the traditional OCR technology to ensure the detection speed and at the same time greatly improve the accuracy of recognition. The paper focuses the following problems:The detection model, training for specific data, image preprocessing and character recognition. First, existing object detection algorithms are compared and their advantages and disadvantages are assessed. While the YOLO model is used to do the detection work, some improvements is proposed to overcome the shortcomings in the YOLO model. In addition, an object detection model is designed to detect the product parameters in images. Finally, tesseract is used to do the character recognition work. The experimental results show that the new system is efficient and effective in parameter recognition. At the end of this paper, the innovation and disadvantage of the presented method are discussed.
WU Li-Fang , HE Jiao-Yu , JIAN Meng , ZOU Yun-Zhen , ZHAO Tie-Song
2018, 29(4):1049-1059. DOI: 10.13328/j.cnki.jos.005409 CSTR:
Abstract:Dust, pollutant and the aerosol particles in the air bring significant challenge to the atmospheric prediction, and the segmentation of millimeter-wave radar cloud image has become a key to deal with the problem. This paper presents superpixel analysis based cloud image segmentation with fully convolutional networks (FCN) and convolutional neural networks (CNN), named FCN-CNN. Firstly, the superpixel analysis is performed to cluster the neighborhood of each pixel in the cloud image. Then the cloud image is given to the FCN with different steps, such as FCN 32s and FCN 8s. The "non-cloud" area in the FCN 32s result must be a part of the "non-cloud" area in the cloud image. Meanwhile, the "cloud" area in the FCN 8s result must be a part of the "cloud" area in the cloud image. The remaining uncertain region of the cloud image needs to be further estimated by CNN. For efficiency, it is necessary to select several key pixels in the superpixel to represent the characteristics of the superpixel region. The selected key pixels are classified by CNN as "cloud" or "non-cloud". The experimental results illustrate that while the accuracy of FCN-CNN is almost equivalent to MR-CNN and SP-CNN, the speed is 880 times higher than MR-CNN, and 1.657 times higher than SP-CNN.
CHEN Shi-Zhe , WANG Shuai , JIN Qin
2018, 29(4):1060-1070. DOI: 10.13328/j.cnki.jos.005412 CSTR:
Abstract:Automatic emotion recognition is a challenging task with a wide range of applications. This paper addresses the problem of emotion recognition in multi-cultural conditions. Different multi-modal features are extracted from audio and visual modalities, and the emotion recognition performance is compared between hand-crafted features and automatically learned features from deep neural networks. Multimodal feature fusion is also explored to combine different modalities. The CHEAVD Chinese multimodal emotion dataset and AFEW English multimodal emotion dataset are utilized to evaluate the proposed methods. The importance of the culture factor for emotion recognition through cross-culture emotion recognition is demonstrated, and then three different strategies, including selecting corresponding emotion model for different cultures, jointly training with multi-cultural datasets, and embedding features from multi-cultural datasets into the same emotion space, are developed to improve the emotion recognition performance in the multi-cultural environment. The embedding strategy separates the culture influence from original features and can generate more discriminative emotion features, resulting in best performance for acoustic and multimodal emotion recognition.
XIE Ning , ZHAO Ting-Ting , YANG Yang , WEI Qin , Heng Tao SHEN
2018, 29(4):1071-1084. DOI: 10.13328/j.cnki.jos.005414 CSTR:
Abstract:Among various traditional art forms, brush stroke drawing is one of the widely used styles in modern computer graphic tools such as GIMP, Photoshop and Painter. In this paper, an AI-aided art authoring (A4) system of non-photorealistic rendering is developed that allows users to automatically generate brush stroke paintings in a specific artist's style. Within the reinforcement learning framework of brush stroke generation, the first contribution in this paper is the application of regularized policy gradient method, which is more suitable for the stroke generation task. The other contribution is to learn artists' drawing styles from video-captured stroke data by inverse reinforcement learning. Experiments demonstrate that the presented system can successfully learn artists' styles and render pictures with consistent and smooth brush strokes.
2018, 29(4):1085-1093. DOI: 10.13328/j.cnki.jos.005536 CSTR:
Abstract:With the rapid development of quantum hardware, people tend to believe that special-purpose quantum computers with more than 100 qubits will be available in 5 to 10 years. It is conceivable that, once this becomes a reality, the development of quantum software will be crucial in harnessing the power of quantum computers. However, due to the distinguishable features of quantum mechanics, such as the no-cloning of quantum information and the nonlocal effect of entanglement, developing correct and efficient quantum programs and communication protocols is a challenging issue. Formal verification methods, particularly model checking techniques, have proven effective in classical software design and system modelling. Therefore, formal verification of quantum software has received more and more attention recently. This article reviews recent research findings in verification of both sequential quantum programs and quantum communication protocols, with the focus placed on the work of the two authors' research groups. Future directions and challenges in this area are also discussed.
DAI Fei , ZHAO Wen-Zhuo , YANG Yun , MO Qi , LI Tong , ZHOU Hua
2018, 29(4):1094-1114. DOI: 10.13328/j.cnki.jos.005280 CSTR:
Abstract:The Business Process Modelling Notation 2.0 (BPMN 2.0) choreography is a de factor standard for capturing the interactions between business processes. Flow-oriented BPMN 2.0 choreographies can exhibit a range of semantic errors in the control flow. The ability to check the semantic correctness of choreographies is thus a desirable feature for modelling tools based on BPMN 2.0 choreographies. However, the semantic analysis of BPMN 2.0 choreographies is hindered by the lack of formal semantic definition of its constructs and the corresponding analysis techniques in the BPMN 2.0 standard specification. This paper defines a formal semantics of BPMN 2.0 choreographies in terms of a mapping to WF-nets. This defined semantics can be used to analyze the structural errors and the control flow errors of BPMN 2.0 choreographies with analysis techniques of Petri nets. The proposed mapping and the semantic analysis have been implemented as a tool. The experimental results show this formalization can identify the semantic errors of choreographies from the BPM AI process model library.
WANG Zhen-Huang , CHEN Si-Ming , YUAN Xiao-Ru
2018, 29(4):1115-1130. DOI: 10.13328/j.cnki.jos.005261 CSTR:
Abstract:With the development and increasing impact of social media (e.g. microblog), it is critical to analyze the topic of the microblog. Topic modeling can extract topics from text data. However, it is a challenging task on the microblog data, due to the short content, heavy noises and limited amount of information in each microblog message. This article proposes a visual analytics system for microblog topic modeling. The proposed system enables the visual exploration and analysis process of the topic modeling results of microblogs with multiple linked views and interactions. It considers user behaviors and time effects in the topic modeling process. Users can analyze topics of microblog from multiple perspectives. The system also supports interactive topic editing to improve the topic modeling results in accuracy and reliability. The case study confirms that the described system can effectively help users analyze the Sina Weibo contents interactively.
DING Shi-Fei , ZHANG Jian , SHI Zhong-Zhi
2018, 29(4):1131-1142. DOI: 10.13328/j.cnki.jos.005263 CSTR:
Abstract:Based on the restricted Boltzmann machine (RBM), which is a probabilistic graphical model, deep learning models contain deep belief net (DBN) and deep Boltzmann machine (DBM). The overfitting problems commonly exist in neural networks and RBM models. In order to alleviate the overfitting problem, this paper introduces weight random variables to the conventional RBM model and, then builds weight uncertainty deep models based on maximum likelihood estimation. In the experimental section, the paper verifies the effectiveness of the weight uncertainty RBM. In order to improve the image recognition ability, the paper introduces the spike-and-slab RBM (ssRBM) to weight uncertainty RBM and then builds the deep models. The experiments show that the deep models based on weight random variables are effective.
XIE Cheng-Wang , XIAO Chi , DING Li-Xin , XIA Xue-Wen , ZHU Jian-Yong , ZHANG Fei-Long
2018, 29(4):1143-1162. DOI: 10.13328/j.cnki.jos.005275 CSTR:
Abstract:It is necessary to develop some novel multi-objective optimization algorithms to cope with the complicated multi-objective optimization problems which are emerging and increasingly hard in reality. The basic firefly algorithm is extended to the realm of multi-objective optimization, and a hybrid multi-objective firefly algorithm (HMOFA) is proposed in this paper. Firstly, an initialization approach of mix-level orthogonal experimental design with the quantification of the continuous search space is used to generate an even-distributed initial population in the decision space. Secondly, the elites in the external archive are randomly selected to guide the movement of the fireflies in the evolutionary process. Finally, the archive pruning strategy based on three-point shortest path is used to maintain the diversity of the external archive. The proposed HMOFA is compared with other five peer algorithms in the performance of hypervolume based on seventeen benchmark multi-objective test instances, and the experimental results show that the HMOFA employs the overall performance advantages in convergence, diversity and robustness over other peer algorithms.
GONG Wei-Hua , CHEN Yan-Qiang , PEI Xiao-Bing , YANG Liang-Huai
2018, 29(4):1163-1176. DOI: 10.13328/j.cnki.jos.005269 CSTR:
Abstract:How to detect the high-quality community structures in location based social networks (LBSN) plays a significant role that helps to study and analyze this novel type of composite network comprehensively. However, most of existing community detection methods in social networks still cannot solve the problems of combining the correlations of multi-typed heterogeneous relations in LBSN. To address the issue, this paper proposes a co-clustering method for mining the users' community with multi-dimensional relationships, called Multi-BVD. Firstly, the objective function of clustering community is given to fuse multi-modal entities and their multi-dimensional relationships embedded in users' social network and geo-tagged location network. Then, in order to gain the minimum value of the given function, Lagrange multiplier method is applied to obtain the iterative upgrading rules of matrix variants so that the optimal results of users' communities can be determined by the way of decomposing block matrices. Simulation results show that the proposed Multi-BVD can find the community structures with geographical characteristics more effectively and accurately in location based social network. At the same time, the mined non-overlapping community has more cohesive structures in both social relationships and geographical tagged interests, which also can better embody the correlations of interests between users' communities and semantic geo-tagged clusters on locations.