基于端到端句子级别的中文唇语识别研究

doi:10.13328/j.cnki.jos.005709

微信服务号

微信订阅号

首页 > 过刊浏览>2020年第31卷第6期 >1747-1760. DOI:10.13328/j.cnki.jos.005709

PDF HTML阅读 XML下载导出引用引用提醒

基于端到端句子级别的中文唇语识别研究
DOI:
                        10.13328/j.cnki.jos.005709
                    
作者:
                        
                        
                    
作者单位:
作者简介:张晓冰(1992-),女,河南洛阳人,博士生,主要研究领域为深度学习,视觉处理;龚海刚(1975-),男,博士,副教授,博士生导师,CCF专业会员,主要研究领域为计算机网络与系统安全,云计算与大数据处理,深度学习;杨帆(1993-),女,硕士,CCF学生会员,主要研究领域为深度学习;戴锡笠(1990-),男,博士,主要研究领域为机器视觉,机器学习,深度学习.
通讯作者:戴锡笠,E-mail:daixili_cs@163.com
中图分类号:TP18
基金项目:国家自然科学基金（61572113）

Chinese Sentence-Level Lip Reading Based on End-to-End Model

Author:

Affiliation:

Fund Project:

National Natural Science Foundation of China (61572113)

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

近年来，随着深度学习的广泛应用，唇语识别技术也取得了快速的发展.与传统的方法不同，在基于深度学习的唇语识别模型中，通常包含使用神经网络对图像进行特征提取和特征理解两个部分.根据中文唇语识别的特点，将识别过程划分为两个阶段——图片到拼音（P2P）以及拼音到汉字（P2CC）的识别.分别设计两个不同子网络针对不同的识别过程，当两个子网络训练好后，再把它们放在一起进行端到端的整体架构优化.由于目前没有可用的中文唇语数据集，因此采用半自动化的方法从CCTV官网上收集了6个月20.95GB的中文唇语数据集CCTVDS，共包含14 975个样本.此外，额外采集了269 558条拼音汉字样本数据对拼音到汉字识别模块进行预训练.在CCTVDS数据集上的实验结果表明，所提出的ChLipNet可分别达到45.7%的句子识别准确率和58.5%的拼音序列识别准确率.此外，ChLipNet不仅可以加速训练、减少过拟合，并且能够克服汉语识别中的歧义模糊性.

Abstract:

In recent years, with the widely application of deep learning, lip reading recognition technology has achieved rapid development. Different from traditional methods, lip reading recognition methods based on the deep learning usually use the neural network model both for the feature extraction and comprehension. According to the characteristics of Chinese language, a two-step end-to-end architecture is implemented, in which two deep neural network modules are applied to perform the recognition of picture-to-pinyin (P2P) and pinyin-to-hanzi (P2CC) respectively. After the two modules are trained with convergence, they are then jointly optimized to improve the overall performance. Due to the lack of Chinese lip reading dataset, the 6-month daily news broadcasts are collected from China Central Television (CCTV), and they are semi-automatically labelled into a 20.95 GB dataset CCTVDS with 14 975 samples. In addition, the supplementary dataset with 269 558 samples are collected during the pre-training of P2CC. According to experimental results trained on the CCTVDS, the proposed ChLipNet can achieve 45.7% sentence-level and 58.5% Pinyin-level accuracies. In addition, ChLipNet can not only accelerate training, reduce overfitting, but also overcome syntactic ambiguity in the recognition of Chinese language.

参考文献

相似文献

引证文献

引用本文

张晓冰,龚海刚,杨帆,戴锡笠.基于端到端句子级别的中文唇语识别研究.软件学报,2020,31(6):1747-1760

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2018-05-10
最后修改日期:2018-09-04
录用日期:
在线发布日期: 2020-06-04
出版日期: 2020-06-06

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码