Abstract:Tailoring voice font, or pruning redundant synthesis instances, is an important issue of scalable Corpus-based Text To Speech (TTS) system. However, pruning redundant synthesis instances, usually results in the loss of non-uniform. In order to solve this problem, the concept of virtual non-uniform is proposed. According to this concept and the synthesis frequency of each instance, an algorithm named StaRp-VPA is constructed to make TTS scalable to hardware. In experiments, the naturalness scored by Mean Opinion Score (MOS) remains almost unchanged when less than 50% instances are pruned off, and the MOS does not severely degrade when the reduction rate is above 50%.