Automatic Building of Synthetic Voices from Audio Books

Current state-of-the-art text-to-speech systems produce intelligible utterances, but lack the prosody of natural speech. This is due to poor models of prosody built from single sentence recordings such as CMU ARCTIC. Building better models of prosody involves development of prosodically rich speech...

Full description

Bibliographic Details
Main Authors:	Kishore Prahallad, Mosur Ravishankar, Tanja Schultz
Other Authors:	The Pennsylvania State University CiteSeerX Archives
Format:	Text
Language:	English
Published:	2010
Subjects:	Speech synthesis audio books voice conversion speaker-specific Arctic
Online Access:	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.207.9765 http://www.lti.cs.cmu.edu/Research/Thesis/sunkeswari,%20kishore.pdf

id	ftciteseerx:oai:CiteSeerX.psu:10.1.1.207.9765
record_format	openpolar
spelling	ftciteseerx:oai:CiteSeerX.psu:10.1.1.207.9765 2023-05-15T15:02:26+02:00 Automatic Building of Synthetic Voices from Audio Books Kishore Prahallad Mosur Ravishankar Tanja Schultz The Pennsylvania State University CiteSeerX Archives 2010 application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.207.9765 http://www.lti.cs.cmu.edu/Research/Thesis/sunkeswari,%20kishore.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.207.9765 http://www.lti.cs.cmu.edu/Research/Thesis/sunkeswari,%20kishore.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://www.lti.cs.cmu.edu/Research/Thesis/sunkeswari,%20kishore.pdf Speech synthesis audio books voice conversion speaker-specific text 2010 ftciteseerx 2016-01-07T17:42:48Z Current state-of-the-art text-to-speech systems produce intelligible utterances, but lack the prosody of natural speech. This is due to poor models of prosody built from single sentence recordings such as CMU ARCTIC. Building better models of prosody involves development of prosodically rich speech databases. However, development of such speech databases requires a large amount of effort and time. An alternative is to exploit story style monologues (long speech files) in audio books. These monologues already encapsulate rich prosody including varied intonation contours, pitch accents and phrasing patterns. Thus, audio books act as excellent candidates for building prosodic models and natural sounding synthetic voices. The processing of such audio books poses several challenges including segmentation of long speech files, detection of mispronunciations, extraction and evaluation of representations of prosody. Text Arctic Unknown Arctic
institution	Open Polar
collection	Unknown
op_collection_id	ftciteseerx
language	English
topic	Speech synthesis audio books voice conversion speaker-specific
spellingShingle	Speech synthesis audio books voice conversion speaker-specific Kishore Prahallad Mosur Ravishankar Tanja Schultz Automatic Building of Synthetic Voices from Audio Books
topic_facet	Speech synthesis audio books voice conversion speaker-specific
description	Current state-of-the-art text-to-speech systems produce intelligible utterances, but lack the prosody of natural speech. This is due to poor models of prosody built from single sentence recordings such as CMU ARCTIC. Building better models of prosody involves development of prosodically rich speech databases. However, development of such speech databases requires a large amount of effort and time. An alternative is to exploit story style monologues (long speech files) in audio books. These monologues already encapsulate rich prosody including varied intonation contours, pitch accents and phrasing patterns. Thus, audio books act as excellent candidates for building prosodic models and natural sounding synthetic voices. The processing of such audio books poses several challenges including segmentation of long speech files, detection of mispronunciations, extraction and evaluation of representations of prosody.
author2	The Pennsylvania State University CiteSeerX Archives
format	Text
author	Kishore Prahallad Mosur Ravishankar Tanja Schultz
author_facet	Kishore Prahallad Mosur Ravishankar Tanja Schultz
author_sort	Kishore Prahallad
title	Automatic Building of Synthetic Voices from Audio Books
title_short	Automatic Building of Synthetic Voices from Audio Books
title_full	Automatic Building of Synthetic Voices from Audio Books
title_fullStr	Automatic Building of Synthetic Voices from Audio Books
title_full_unstemmed	Automatic Building of Synthetic Voices from Audio Books
title_sort	automatic building of synthetic voices from audio books
publishDate	2010
url	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.207.9765 http://www.lti.cs.cmu.edu/Research/Thesis/sunkeswari,%20kishore.pdf
geographic	Arctic
geographic_facet	Arctic
genre	Arctic
genre_facet	Arctic
op_source	http://www.lti.cs.cmu.edu/Research/Thesis/sunkeswari,%20kishore.pdf
op_relation	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.207.9765 http://www.lti.cs.cmu.edu/Research/Thesis/sunkeswari,%20kishore.pdf
op_rights	Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_	1766334387847168000

Automatic Building of Synthetic Voices from Audio Books

Similar Items