基于机器学习的语音驱动人脸动画方法

微信服务号

微信订阅号

2025年4月4日 3:43 星期五

首页 > 过刊浏览>2003年第14卷第2期 >215-221

基于机器学习的语音驱动人脸动画方法
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        陈益强陈益强
中国科学院,计算技术研究所,北京,100080
在期刊界中查找
在百度中查找
在本站中查找
高文高文
中国科学院,计算技术研究所,北京,100080;哈尔滨工业大学,计算机科学与工程系,黑龙江,哈尔滨,150001
在期刊界中查找
在百度中查找
在本站中查找
王兆其王兆其
中国科学院,计算技术研究所,北京,100080
在期刊界中查找
在百度中查找
在本站中查找
姜大龙姜大龙
中国科学院,计算技术研究所,北京,100080
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:Supported by the National Natural Science Foundation of China under Grant No.60103007 (国家自然科学基金); the National High-Tech Research and Development Plan of China under Grant No.2001AA114160 (国家高技术研究发展计划)

A Speech Driven Face Animation System Based on Machine Learning

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [13]

相似文献 [20]

引证文献

资源附件

文章评论

摘要:

语音与唇动面部表情的同步是人脸动画的难点之一.综合利用聚类和机器学习的方法学习语音信号和唇动面部表情之间的同步关系,并应用于基于MEPG-4标准的语音驱动人脸动画系统中.在大规模音视频同步数据库的基础上,利用无监督聚类发现了能有效表征人脸运动的基本模式,采用神经网络学习训练,实现了从含韵律的语音特征到人脸运动基本模式的直接映射,不仅回避了语音识别鲁棒性不高的缺陷,同时学习的结果还可以直接驱动人脸网格.最后给出对语音驱动人脸动画系统定量和定性的两种分析评价方法.实验结果表明,基于机器学习的语音驱动人脸动画不仅能有效地解决语音视频同步的难题,增强动画的真实感和逼真性,同时基于MPEG-4的学习结果独立于人脸模型,还可用来驱动各种不同的人脸模型,包括真实视频、2D卡通人物以及3维虚拟人脸.

关键词:机器学习;人脸动画;语音驱动

Abstract:

Lip synchronization is the key issue in speech driven face animation system. In this paper, some clustering and machine learning methods are combined together to estimate face animation parameters from audio sequences and then apply the learning results to MPEG-4 based speech driven face animation system. Based on a large recorded audio-visual database, an unsupervised cluster algorithm is proposed to obtain basic face animation parameter patterns that can describe face motion characteristic. An Artificial Neural Network (ANN) is trained to map the cepstral coefficients of an individual's natural speech to face animation parameter patterns directly. It avoids the potential limitation of speech recognition. And the output can be used to drive the articulation of the synthetic face straightforward. Two approaches for evaluation test are also proposed: quantitative evaluation and qualitative evaluation. The performance of this system shows that the proposed learning algorithm is suitable, which greatly improves the realism of face animation during speech. And this MPEG-4 based learning are suitable for driving many different kinds of animation ranging from video-realistic image wraps to 3D Cartoon characters.

Key words:machine learning;facial animation;speech driven

参考文献

[1]Beskow J. Rule-Based visual speech synthesis. In: Proceedings of the 4th European Conference on Speech Communication and Technology. 1995. 299～302. http://www.speech.kth.se/～beskow/papers/es95rul.pdf.

[2]Waters K, Levergood, TM. DECface : an automatic lip-synchronization algorithm for synthetic face. Technical Report, CRL 93-4, Digital Equipment Corporation, Cambridge Research Laboratory, 1993. ftp://crl.dec.com/pub/DEC/CRL/tech-reports/93.4.ps.Z.

[3]Hong PY, Wen Z, Huang TS. IFACE: a 3D synthetic talking face. International Journal of Image and Graphics, 2001,1(1):1～8.

[4]Ezzat T, Poggio, T. Visual speech synthesis by morphing visemes. International Journal of Computer Vision, 2000,38(1):45～57.

[5]Yehia H, Kuratate T, Vatikiotis-Bateson E. Using speech acoustics to drive facial motion. In: Proceedings of the 14th international congress of phonetic sciences (ICPhS'99). 1999. 631～634. http://trill.berkeley.edu/ICPhS/frameless/acceptance.html.

[6]Massaro DW, Beskow J, Cohen MM. Picture my voice: audio to visual speech synthesis using artificial neural networks. In: Proceedings of the 4th Annual Auditory-Visual Speech Processing Conference (AVSP'99). 1999. 105～111. http://mambo.ucsc.edu/ pdf/avsp9922.pdf.

[7]Brand M. Voice puppetry. In: Proceedings of the SIGGRAPH'99. 1999. 21～28. http://www.cs.cmu.edu/～ph/869/papers/Brand- sigg99.pdf.

[8]Ostermann J. Animation of synthetic faces in MPEG-4. Computer Animation, 1998. 49～51. http://www.research.att.com/projects/ AnimatedHead/pimages/companim3.pdf.

[9]Zhen B, Wu XH, Liu ZM, Chi HS. An enhanced RASTA processing for speaker identification, In: Huang TY, ed. Proceedings of the International Symposium of Chinese Spoken Language Processing. Beijing: China Military Friendship Publish,2000. 251～255.

[10]Wang AH, Bao HQ, Chen JY. Primary research on the viseme system in standard Chinese, In: Huang TY, ed. Proceedings of the International Symposium of Chinese Spoken Language Processing. Beijing: China Military Friendship Publish, 2000. 215～218.

[11]Chen T, Rao R. Audio-Visual integration in multimodal communication. In: Proceedings of the IEEE, Vol 86. 1998. 837～852. http://citeseer.nj.nec.com/chen98audiovisual.html.

[12]Chen YQ, Gao W, Zhu TS, Ma JY. Multi-Strategy data mining framework for mandarin prosodic pattern. In: Yuan BZ, ed. Proceedings of the 6th International Conference on Spoken Language Processing. Beijing: China Military Friendship Press, 2000, II:59～62.

[13]Shan SG, Gao W, Yan J, Individual 3d face synthesis based on orthogonal photos and speech-driven facial animation. In: Proceedings of the International Conference on Image Processing (ICIP 2000), Vol III. 2000. 238～242. http://www.jdl.ac.cn/user/ sgshan/pub/Shan-ICIP00.pdf.

引用本文

陈益强,高文,王兆其,姜大龙.基于机器学习的语音驱动人脸动画方法.软件学报,2003,14(2):215-221

复制

文章指标

点击次数:4444
下载次数: 6727
HTML阅读次数: 0
引用次数: 0

历史

收稿日期:2001-06-04
最后修改日期:2001-08-01
录用日期:
在线发布日期:
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码