A Probabilistic Approach to Unit Selection for Corpus-based Speech Synthesis

In this paper, we present a novel statistical approach to corpus-based speech synthesis. Unit selection is directed by probabilistic models for F0 contour, duration, and spectral characteristics of the synthesis units. The F0 targets for units are modeled by statistical additive models, and duration...

Full description

Bibliographic Details
Main Authors: Shinsuke Sakai, Han Shu
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.214.8712
http://www.festvox.org/blizzard/bc2005/IS052272.PDF
id ftciteseerx:oai:CiteSeerX.psu:10.1.1.214.8712
record_format openpolar
spelling ftciteseerx:oai:CiteSeerX.psu:10.1.1.214.8712 2023-05-15T15:00:07+02:00 A Probabilistic Approach to Unit Selection for Corpus-based Speech Synthesis Shinsuke Sakai Han Shu The Pennsylvania State University CiteSeerX Archives http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.214.8712 http://www.festvox.org/blizzard/bc2005/IS052272.PDF en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.214.8712 http://www.festvox.org/blizzard/bc2005/IS052272.PDF Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://www.festvox.org/blizzard/bc2005/IS052272.PDF text ftciteseerx 2016-01-07T17:59:55Z In this paper, we present a novel statistical approach to corpus-based speech synthesis. Unit selection is directed by probabilistic models for F0 contour, duration, and spectral characteristics of the synthesis units. The F0 targets for units are modeled by statistical additive models, and duration targets are modeled by regression trees. Spectral targets for a unit is modeled by Gaussian mixtures on MFCC-based features. Goodness of concatenation of two units is modeled by conditional Gaussian models on MFCC-based features. Although the system is in its early stage of development, we implemented an English speech synthesizer with CMU Arctic corpora and confirmed the effectiveness of this new framework. 1. Text Arctic Unknown Arctic
institution Open Polar
collection Unknown
op_collection_id ftciteseerx
language English
description In this paper, we present a novel statistical approach to corpus-based speech synthesis. Unit selection is directed by probabilistic models for F0 contour, duration, and spectral characteristics of the synthesis units. The F0 targets for units are modeled by statistical additive models, and duration targets are modeled by regression trees. Spectral targets for a unit is modeled by Gaussian mixtures on MFCC-based features. Goodness of concatenation of two units is modeled by conditional Gaussian models on MFCC-based features. Although the system is in its early stage of development, we implemented an English speech synthesizer with CMU Arctic corpora and confirmed the effectiveness of this new framework. 1.
author2 The Pennsylvania State University CiteSeerX Archives
format Text
author Shinsuke Sakai
Han Shu
spellingShingle Shinsuke Sakai
Han Shu
A Probabilistic Approach to Unit Selection for Corpus-based Speech Synthesis
author_facet Shinsuke Sakai
Han Shu
author_sort Shinsuke Sakai
title A Probabilistic Approach to Unit Selection for Corpus-based Speech Synthesis
title_short A Probabilistic Approach to Unit Selection for Corpus-based Speech Synthesis
title_full A Probabilistic Approach to Unit Selection for Corpus-based Speech Synthesis
title_fullStr A Probabilistic Approach to Unit Selection for Corpus-based Speech Synthesis
title_full_unstemmed A Probabilistic Approach to Unit Selection for Corpus-based Speech Synthesis
title_sort probabilistic approach to unit selection for corpus-based speech synthesis
url http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.214.8712
http://www.festvox.org/blizzard/bc2005/IS052272.PDF
geographic Arctic
geographic_facet Arctic
genre Arctic
genre_facet Arctic
op_source http://www.festvox.org/blizzard/bc2005/IS052272.PDF
op_relation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.214.8712
http://www.festvox.org/blizzard/bc2005/IS052272.PDF
op_rights Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_ 1766332231457964032