JIANG Yue-Hui , ZHANG Qian , WANG Bin , SHEN Hui-Zhong , HUANG Ji-Feng , YAN Tao
Abstract:Aiming at the low accuracy of the large-pose face alignment algorithm, this paper designs and implements a new hierarchical parallel and multi-scale Inception-resnet network to achieve large-pose face alignment. Firstly, a four-class Hourglass network model is constructed. The model directly inputs images for face alignment in an end-to-end manner. Secondly, the network internally uses pre-set parameters for sampling and feature extraction. Finally, the corresponding face feature points are directly output. A two-dimensional coordinate point drawing of the image and the equivalent face size is extracted, and the proposed method is tested on the AFLW2000-3D data set. Experimental results show that the normalized average error of this method is 4.41% for any unconstrained two-dimensional face image. Compared with the traditional method, the positive face attitude image outputted in this paper has high visual quality and fidelity.
ZHANG Shu , DU Cong-Yang , WU Jin-Jian , SHI Guang-Ming , XIE Xue-Mei
Abstract:Event camera has attracted the attention of the majority of researchers due to the inspiration of biological vision, breaks the way of regular data acquisition in the field of computer vision, directly hits the pain point of RGB images, and brings the advantages that 2D image sensors cannot match. Event Camera brings the advantages of removing redundant information, fast sensing capability, high dynamic range sensitivity and low power consumption, while its asynchronous event data cannot be directly applied to existing computer vision processing modes. Therefore, this paper classify the data stream using the key event based classification method. This method detects corner events with important information and only extracts features of corner events. While retaining the important features of event and condensing the extraction of event stream features, the amount of computation for other events is effectively reduced. The preset gesture is recognized to verify the validity of this method, achieving an accuracy of 97.86%.
LIU Bo-Yu , WU Ling-Da , HAO Hong-Xing
Abstract:Sparse coding has been widely used in complex value image demising. In recent years, the proposed block sparse coding has more advantages in noise filtering and noise reduction because it can make full use of the similarity of patches in the same block. In this paper, a K-means clustering method based sparse demising algorithm for complex image grouping is studied. By improving the clustering algorithm, the grouping effectiveness of K-means algorithm for sparse block coding algorithm is verified. The online complex dictionary training algorithm is used to acquire the coded dictionary quickly, and the sparse coding of block image is realized by using the grouping orthogonal matching pursuit algorithm. By inducing the similarity of the coding in each block, the coding of noise in the block is effectively suppressed and the noise reduction of the complex value image is improved. In order to verify the effectiveness of the proposed algorithm, the demising of simulated and real interferometric synthetic aperture radar images is quantitatively analyzed, which proves that the proposed algorithm has a certain improvement in peak signal-to-noise ratio (PSNR) compared with the previous block sparse coding algorithm. Finally, the real interferometric synthetic aperture radar image is demised, which further verifies the de-noising ability of the proposed algorithm for real noise.
WU Long , LI Ta , WANG Li , YAN Yong-Hong
Abstract:In order to further utilize near-field speech data to improve the performance of far-field speech recognition, this paper proposes an approach to integrate knowledge distillation with the generative adversarial network. In this work, a multi-task learning structure is firstly proposed to jointly train the acoustic model with feature mapping. To enhance the acoustic modeling, the acoustic model trained with far-field data (student model) is guided by an acoustic model trained with near-field data (teacher model). Such training process makes the student model mimics the behavior of the teacher model by minimizing the Kullback-Leibler Divergence. To improve the speech enhancement, an additional discriminator network is introduced to distinguish the enhanced features from the real clean ones. The distribution of the enhanced features is further pushed towards that of the clean features through this adversarial multi-task training. Evaluated on AMI single distant microphone data, the method achieves 5.6% relative non-overlapped word error rate (WER) and 4.7% relative overlapped WER decrease over the baseline model. Evaluated on AMI multi-channel distant microphone data, the method achieves 6.2% relative non-overlapped WER and 4.1% relative overlapped WER decrease over the baseline model. Evaluated on the TIMIT data, the method can reach 7.2% WER reduction. To better demonstrate the effects of generative adversarial network on speech enhancement, the enhanced features is visualized and the effectiveness of this method is verified.
CHENG Shi-Wei , HU Yi-Lin , SUN Yu-Jie
Abstract:This paper studied visual attention behavior characteristics during reading process. Several visualization such as eye movement heatmap, doughnut chart, node link graph, and word cloud were designed to extract eye movement data and text themes for analysis of reading behavior characteristics and document structure. A visual aid prototype system for reading assistance was developed to record the eye movement data of expert users (such as teachers), and the visualization can be shared to novice users (such as students). The user study results showed that the average scores of objective and subjective questions in the experimental group were increased by 31.8% and 55.0%, respectively, and the total reading and answering time was reduced by 9.7%. It can be seen that this system can effectively help readers improve reading efficiency as well as quickly grasp the focus of the article and better understand the content of the article, so that it has certain effectiveness and feasibility.
TONG Qing-Shan , ZHANG Zong-Qi , HUANG Jin , TIAN Feng , LIU Jie , DAI Guo-Zhong
Abstract:Pen-based user interface relying on touch technology is one of the Post-WIMP interfaces. It discards the physical keyboard and mouse, which changed the method of human-computer interaction to some extent. Though sketch drawing software and recognition software are constantly emerging, there is no mature pen-based interface design development tool. Based on the PGIS interaction paradigm and scenario design method, this paper develops a tool named with SDT that allows hybrid input of graphics and sketchs based on pen-based interaction primitives. Firstly, based on the principle of high-cohesion and low-coupling in the field of software engineering, the "Separation-Fusion" design method is proposed. Accordingly, the overall architecture of the system is put forward. Secondly, the essential technologies are elaborated from three aspects:user interface description language, pen-based interactive primitive and mono-case, hybrid input. Thirdly, an example of a complete application is built by the SDT, which makes the availability and feasibility of the system more convincible. Finally, the advantage and effectiveness of the tool are verified by two evaluation experiments.