Using 5 ms segments in concatenative speech synthesis

A concatenative speech synthesis system increases its potential to generate natural speech if the system uses more short speech segments, since the concatenation variation becomes greater. In this paper, we propose the use of very short speech segments (5 ms, one pitch period of 200 Hz pitch) for co...

Full description

Bibliographic Details
Main Authors: Toshio Hirai, Seiichi Tenpaku
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Published: 2004
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.385.292
http://isca-speech.org/archive_open/archive_papers/ssw5/ssw5_037.pdf
Description
Summary:A concatenative speech synthesis system increases its potential to generate natural speech if the system uses more short speech segments, since the concatenation variation becomes greater. In this paper, we propose the use of very short speech segments (5 ms, one pitch period of 200 Hz pitch) for concatenative speech synthesis. The proposed method is applied to the speech database CMU ARCTIC, and 100 sentences synthesized. Though the synthesized speech maintains the speaker’s identity and is natural enough, it also has some noises caused by inappropriate unit selection, and the formant changes are awkward in some vowel regions. 1.