Stable Boundary-Based Non-Uniform Unit Selection in Speech Synthesis

微信服务号

微信订阅号

2025-4-24- 22

Home > Archive>Volume 25, Issue S2, 2014 >63-69

PDF HTML XML Export Cite reminder

Stable Boundary-Based Non-Uniform Unit Selection in Speech Synthesis
DOI:
                        
                    
Author:
                        WANG XinWANG Xin
Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems (Graduate School at Shenzhen, Tsinghua University), Shenzhen 518055, China;Tsinghua National Laboratory for Information Science and Technology (Tsinghua University), Beijing 100084, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
WU Zhi-YongWU Zhi-Yong
Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems (Graduate School at Shenzhen, Tsinghua University), Shenzhen 518055, China;Tsinghua National Laboratory for Information Science and Technology (Tsinghua University), Beijing 100084, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site
CAI Lian-HongCAI Lian-Hong
Tsinghua-CUHK Joint Research Center for Media Sciences, Technologies and Systems (Graduate School at Shenzhen, Tsinghua University), Shenzhen 518055, China;Tsinghua National Laboratory for Information Science and Technology (Tsinghua University), Beijing 100084, China
Find this author on CNKI
Find this author on BaiDu
Search for this author on this site

                    
Affiliation:
Clc Number:
Fund Project:

Article

Figures

Metrics

Reference [15]

Related [20]

Cited by

Materials

Comments

Abstract:

Speech synthesis technology plays an important role in human computer interaction. Based on the traditional cost function based unit selection method, this paper proposes an approach that incorporates diphone's stable boundary model into word and syllable, and utilizes multi-layer Viterbi algorithm for selecting the best path from the corpus to generate the final waveforms. With the proposed multi-layer non-uniform unit selection algorithm, the new method can not only choose the longer prosody units which have correct acoustical characteristic to reduce the concatenate points while including the potential coarticulation and bad labeled phones inside the longer units, but also fix the traditional unit boundary type to absorb the diphone's good stable joint character to improve the continuity and naturalness at concatenate boundaries. The evaluation results show that by using this approach, the synthetic speech can achieve great improvements on both naturalness and intelligibility compared with the traditional diphone-based unit selection approach.

Key words:speech synthesis;TTS;diphone;stable boundary type;multi-layer non-uniform unit selection

Reference

[1] Hunt AJ, Black AW. Unit selection in a concatenative speech synthesis system using a large speech database. In: Proc. of the ICASSP. 1996. 373-376.

[2] 倪昕.语料库支持的英语文语转换合成引擎[硕士学位论文].北京:清华大学,2004.

[3] 裴定瑜.基于大语料库英文TTS语音拼接单元的选择[硕士学位论文].上海:同济大学,2006.

[4] Tokuda K, Yoshimura T, Masuko T, Kobayashi T, Kitamura T. Speech parameter generation algorithms for HMM-based speech synthesis. In: Proc. of the ICASSP, Vol.3. 2001. 1315-1318.

[5] 胡克,康世胤,郝军.中文HMM参数化语音合成系统构建.通信技术,2012,45(8):101-103,108. [doi: 10.3969/j.issn.1002-0802. 2012.08.032]

[6] Yamagishi J. An introduction to HMM-based speech synthesis. Technical Report, Tokyo Institute of Technology, 2006.

[7] Kang S, Qian X, Meng H. Multi-Distribution deep belief network for speech synthesis. In: Proc. of the ICASSP2013. 2013.

[8] Black AW, Taylor P. Automatically clustering similar units for unit selection in speech synthesis. In: Proc. of the EUROSPEECH, Vol.2. 1997. 601-604.

[9] Kishore SP, Black AW. Unit size in unit selection speech synthesis. In: Proc. of the INTERSPEECH. 2003.

[10] Latacz L, Kong YO, Verhelst W. Unit selection synthesis using long non-uniform units and phonemic identity matching. In: Proc. of the Blizzard Challenge Workshop. 2007.

[11] Chu M, Peng H, Yang HY, Chang E. Selecting non-uniform units from a very large corpus for concatenative speech synthesizer. In: Proc. of the ICASSP, Vol.2. 2001. 785-788.

[12] Black AW, Clark R. The festival speech synthesis system. 2011. http://www.cstr.ed.ac.uk/projects/festival/

[13] 倪昕,蔡莲红.基于混合基元模型的非定长基元选取算法.小型微型计算机系统,2005,6:1079-1082.

[14] Clark RAJ, Richmond K, King S. Multisyn: Open-Domain unit selection for the Festival speech synthesis system. Speech Communication, 2007,49(4):317-330.

[15] Kominek J, Black AW. The CMU Arctic speech databases. In: Proc. of the 5th ISCA Workshop on Speech Synthesis. 2004.

Get Citation

王欣,吴志勇,蔡莲红.语音合成中基于稳定段边界的不定长基元选取.软件学报,2014,25(S2):63-69

Copy

Article Metrics

Abstract:2375
PDF: 4803
HTML: 0
Cited by: 0

History

Received:June 15,2013
Revised:August 21,2013
Adopted:
Online: January 29,2015
Published:

You are the first2038320Visitors
Copyright: Institute of Software, Chinese Academy of Sciences Beijing ICP No. 05046678-4
Address：4# South Fourth Street, Zhong Guan Cun, Beijing 100190,Postal Code：100190
Phone：010-62562563 Fax：010-62562533 Email：jos@iscas.ac.cn
Technical Support：Beijing Qinyun Technology Development Co., Ltd.

Beijing Public Network Security No. 11040202500063

微信服务号

微信订阅号

Get Citation

Share

微信扫一扫：分享

Article Metrics

History