Blizzard 2008: Experiments on Unit Size for Unit Selection Speech Synthesis

This paper describes the techniques and approaches developed at IIIT Hyderabad for building synthetic voices in Blizzard 2008 speech synthesis challenge. We have submitted three different voices: English full voice, English ARCTIC voice and Mandarin voice. Our system is identified as D. In building...

Full description

Bibliographic Details
Main Authors: E. Veera Raghavendra, Srinivas Desai, B. Yegnanarayana, Alan W Black, Kishore Prahallad
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.145.3050
http://www.cs.cmu.edu/~awb/papers/bc2008/IIIT-H_D.pdf
Description
Summary:This paper describes the techniques and approaches developed at IIIT Hyderabad for building synthetic voices in Blizzard 2008 speech synthesis challenge. We have submitted three different voices: English full voice, English ARCTIC voice and Mandarin voice. Our system is identified as D. In building the three voices, our approach has been to experiment and exploit syllable-like large units for concatenative synthesis. Inspite of large database supplied in Blizzard 2008, we find that a backoff strategy is essential in using syllable-like units. In this paper, we propose a novel technique of approximate matching of the syllables as back-off technique for building voices. Index Terms: speech synthesis, unit size, tonal unit, prominence 1.