Chinese Sentence-Level Lip Reading Based on End-to-End Model

doi:10.13328/j.cnki.jos.005709

微信服务号

微信订阅号

2025-5-13- 3

Home > Archive>Volume 31, Issue 6, 2020 >1747-1760. DOI:10.13328/j.cnki.jos.005709

PDF HTML XML Export Cite reminder

Chinese Sentence-Level Lip Reading Based on End-to-End Model
DOI:
                        10.13328/j.cnki.jos.005709
                    
Author:
                        ZHANG Xiao-BingZHANG Xiao-Bing
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
GONG Hai-GangGONG Hai-Gang
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
YANG FanYANG Fan
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
DAI Xi-LiDAI Xi-Li
School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:TP18
Fund Project:National Natural Science Foundation of China (61572113)

Article

Figures

Metrics

Reference

Cited by

Materials

Comments

Abstract:

In recent years, with the widely application of deep learning, lip reading recognition technology has achieved rapid development. Different from traditional methods, lip reading recognition methods based on the deep learning usually use the neural network model both for the feature extraction and comprehension. According to the characteristics of Chinese language, a two-step end-to-end architecture is implemented, in which two deep neural network modules are applied to perform the recognition of picture-to-pinyin (P2P) and pinyin-to-hanzi (P2CC) respectively. After the two modules are trained with convergence, they are then jointly optimized to improve the overall performance. Due to the lack of Chinese lip reading dataset, the 6-month daily news broadcasts are collected from China Central Television (CCTV), and they are semi-automatically labelled into a 20.95 GB dataset CCTVDS with 14 975 samples. In addition, the supplementary dataset with 269 558 samples are collected during the pre-training of P2CC. According to experimental results trained on the CCTVDS, the proposed ChLipNet can achieve 45.7% sentence-level and 58.5% Pinyin-level accuracies. In addition, ChLipNet can not only accelerate training, reduce overfitting, but also overcome syntactic ambiguity in the recognition of Chinese language.

Key words:Chinese lip reading recognition|deep learning|characteristics of Chinese language|data collecting and preprocessing|end-to-end model

Get Citation

张晓冰,龚海刚,杨帆,戴锡笠.基于端到端句子级别的中文唇语识别研究.软件学报,2020,31(6):1747-1760

Copy

Article Metrics

Abstract:
PDF:
HTML:
Cited by:

History

Received:May 10,2018
Revised:September 04,2018
Adopted:
Online: June 04,2020
Published: June 06,2020

You are the firstVisitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History