Unsupervised Speech Morphing between Utterances of any Speakers

A new approach to speech morphing is presented which avoids the extraction of fundamental and formant frequencies as well as the detection of phone or syllable boundaries. All prominent spectral and temporal features of the source and target utterances are automatically related and interpolated. The...

Full description

Bibliographic Details
Main Author: Hartmut R. Pfitzinger
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Published: 2004
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736
http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf
Description
Summary:A new approach to speech morphing is presented which avoids the extraction of fundamental and formant frequencies as well as the detection of phone or syllable boundaries. All prominent spectral and temporal features of the source and target utterances are automatically related and interpolated. The method consists of three main parts: LPC-based source-filter decomposition, separate interpolation, and composition of the morphed speech signal. The paper focuses on the alignment and interpolation problems on three speech signal layers: the timing structure on a phone- and syllable-level, the shape of the frequency spectrum including formants and other spectral properties, and the micro-timing of the source signal. Particularly, the source signal alignment and interpolation is described since it is most crucial for the resulting quality of the modified speech signal. The new morphing procedure was applied to utterances taken from the freely available CMU ARCTIC speech corpus and assessed by a perceptual MOS experiment. Preliminary