Automatic Building of Synthetic Voices from Audio Books

Current state-of-the-art text-to-speech systems produce intelligible utterances, but lack the prosody of natural speech. This is due to poor models of prosody built from single sentence recordings such as CMU ARCTIC. Building better models of prosody involves development of prosodically rich speech...

Full description

Bibliographic Details
Main Authors: Kishore Prahallad, Mosur Ravishankar, Tanja Schultz
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Published: 2010
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.207.9765
http://www.lti.cs.cmu.edu/Research/Thesis/sunkeswari,%20kishore.pdf
Description
Summary:Current state-of-the-art text-to-speech systems produce intelligible utterances, but lack the prosody of natural speech. This is due to poor models of prosody built from single sentence recordings such as CMU ARCTIC. Building better models of prosody involves development of prosodically rich speech databases. However, development of such speech databases requires a large amount of effort and time. An alternative is to exploit story style monologues (long speech files) in audio books. These monologues already encapsulate rich prosody including varied intonation contours, pitch accents and phrasing patterns. Thus, audio books act as excellent candidates for building prosodic models and natural sounding synthetic voices. The processing of such audio books poses several challenges including segmentation of long speech files, detection of mispronunciations, extraction and evaluation of representations of prosody.