语音合成中基于稳定段边界的不定长基元选取

微信服务号

微信订阅号

2025年4月5日 23:38 星期六

首页 > 过刊浏览>2014年第25卷第S2期 >63-69

PDF HTML阅读 XML下载导出引用引用提醒

语音合成中基于稳定段边界的不定长基元选取
DOI:
                        
                    
CSTR:
                        
                    
作者:
                        王欣王欣
清华大学-香港中文大学媒体科学、技术与系统联合研究中心(清华大学 深圳研究生院), 广东 深圳 518055;清华信息科学与技术国家实验室(清华大学), 北京 100084
在期刊界中查找
在百度中查找
在本站中查找
吴志勇吴志勇
清华大学-香港中文大学媒体科学、技术与系统联合研究中心(清华大学 深圳研究生院), 广东 深圳 518055;清华信息科学与技术国家实验室(清华大学), 北京 100084
在期刊界中查找
在百度中查找
在本站中查找
蔡莲红蔡莲红
清华大学-香港中文大学媒体科学、技术与系统联合研究中心(清华大学 深圳研究生院), 广东 深圳 518055;清华信息科学与技术国家实验室(清华大学), 北京 100084
在期刊界中查找
在百度中查找
在本站中查找

                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:国家自然科学基金(60805008,61003094,61375027,61370023)

Stable Boundary-Based Non-Uniform Unit Selection in Speech Synthesis

Author:

WANG Xin
WANG Xin
Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems (Graduate School at Shenzhen, Tsinghua University), Shenzhen 518055, China;Tsinghua National Laboratory for Information Science and Technology (Tsinghua University), Beijing 100084, China
在期刊界中查找
在百度中查找
在本站中查找
WU Zhi-Yong
WU Zhi-Yong
Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems (Graduate School at Shenzhen, Tsinghua University), Shenzhen 518055, China;Tsinghua National Laboratory for Information Science and Technology (Tsinghua University), Beijing 100084, China
在期刊界中查找
在百度中查找
在本站中查找
CAI Lian-Hong
CAI Lian-Hong
Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems (Graduate School at Shenzhen, Tsinghua University), Shenzhen 518055, China;Tsinghua National Laboratory for Information Science and Technology (Tsinghua University), Beijing 100084, China
在期刊界中查找
在百度中查找
在本站中查找

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献 [15]

相似文献

引证文献

资源附件

文章评论

摘要:

语音合成技术是人机言语交互中重要的媒介方式,基元选取算法一直是拼接式语音合成中的研究重点.在传统的语音合成中基于代价函数的拼接合成基元选取算法的基础上,将双音子(diphone)的稳定段边界模型应用到单词和音节中,最后使用3种基元模型的分层不定长选音算法,从语料库中优选出最佳合成基元序列拼接合成最终语音.该算法一方面利用分层统一的不定长选音策略,尽可能地选取具有更好韵律特性和声学连续性的较大基元,从而显著减少拼接点,将有可能发生协同发音或者切分错误的拼接点包含到更大的基元内部;另一方面通过稳定段切分修改传统拼接基元边界类型,充分利用了diphone的稳定段边界良好的拼接特性,从而提高了合成语音的连续性和自然度.评测结果显示,这种方法与传统diphone拼接合成方法相比,其合成效果有显著的提升.

关键词:语音合成;TTS;diphone;稳定段边界类型;分层不定长基元选取

Abstract:

Speech synthesis technology plays an important role in human computer interaction. Based on the traditional cost function based unit selection method, this paper proposes an approach that incorporates diphone's stable boundary model into word and syllable, and utilizes multi-layer Viterbi algorithm for selecting the best path from the corpus to generate the final waveforms. With the proposed multi-layer non-uniform unit selection algorithm, the new method can not only choose the longer prosody units which have correct acoustical characteristic to reduce the concatenate points while including the potential coarticulation and bad labeled phones inside the longer units, but also fix the traditional unit boundary type to absorb the diphone's good stable joint character to improve the continuity and naturalness at concatenate boundaries. The evaluation results show that by using this approach, the synthetic speech can achieve great improvements on both naturalness and intelligibility compared with the traditional diphone-based unit selection approach.

Key words:speech synthesis;TTS;diphone;stable boundary type;multi-layer non-uniform unit selection

参考文献

[1] Hunt AJ, Black AW. Unit selection in a concatenative speech synthesis system using a large speech database. In: Proc. of the ICASSP. 1996. 373-376.

[2] 倪昕.语料库支持的英语文语转换合成引擎[硕士学位论文].北京:清华大学,2004.

[3] 裴定瑜.基于大语料库英文TTS语音拼接单元的选择[硕士学位论文].上海:同济大学,2006.

[4] Tokuda K, Yoshimura T, Masuko T, Kobayashi T, Kitamura T. Speech parameter generation algorithms for HMM-based speech synthesis. In: Proc. of the ICASSP, Vol.3. 2001. 1315-1318.

[5] 胡克,康世胤,郝军.中文HMM参数化语音合成系统构建.通信技术,2012,45(8):101-103,108. [doi: 10.3969/j.issn.1002-0802. 2012.08.032]

[6] Yamagishi J. An introduction to HMM-based speech synthesis. Technical Report, Tokyo Institute of Technology, 2006.

[7] Kang S, Qian X, Meng H. Multi-Distribution deep belief network for speech synthesis. In: Proc. of the ICASSP2013. 2013.

[8] Black AW, Taylor P. Automatically clustering similar units for unit selection in speech synthesis. In: Proc. of the EUROSPEECH, Vol.2. 1997. 601-604.

[9] Kishore SP, Black AW. Unit size in unit selection speech synthesis. In: Proc. of the INTERSPEECH. 2003.

[10] Latacz L, Kong YO, Verhelst W. Unit selection synthesis using long non-uniform units and phonemic identity matching. In: Proc. of the Blizzard Challenge Workshop. 2007.

[11] Chu M, Peng H, Yang HY, Chang E. Selecting non-uniform units from a very large corpus for concatenative speech synthesizer. In: Proc. of the ICASSP, Vol.2. 2001. 785-788.

[12] Black AW, Clark R. The festival speech synthesis system. 2011. http://www.cstr.ed.ac.uk/projects/festival/

[13] 倪昕,蔡莲红.基于混合基元模型的非定长基元选取算法.小型微型计算机系统,2005,6:1079-1082.

[14] Clark RAJ, Richmond K, King S. Multisyn: Open-Domain unit selection for the Festival speech synthesis system. Speech Communication, 2007,49(4):317-330.

[15] Kominek J, Black AW. The CMU Arctic speech databases. In: Proc. of the 5th ISCA Workshop on Speech Synthesis. 2004.

引用本文

王欣,吴志勇,蔡莲红.语音合成中基于稳定段边界的不定长基元选取.软件学报,2014,25(S2):63-69

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2013-06-15
最后修改日期:2013-08-21
录用日期:
在线发布日期: 2015-01-29
出版日期:

微信服务号

微信订阅号

引用本文

分享

文章指标

历史

文章二维码

微信服务号

微信订阅号

引用本文

分享

微信扫一扫：分享

文章指标

历史

文章二维码