Unsupervised Speech Morphing between Utterances of any Speakers

A new approach to speech morphing is presented which avoids the extraction of fundamental and formant frequencies as well as the detection of phone or syllable boundaries. All prominent spectral and temporal features of the source and target utterances are automatically related and interpolated. The...

Full description

Bibliographic Details
Main Author: Hartmut R. Pfitzinger
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Published: 2004
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736
http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf
id ftciteseerx:oai:CiteSeerX.psu:10.1.1.61.8736
record_format openpolar
spelling ftciteseerx:oai:CiteSeerX.psu:10.1.1.61.8736 2023-05-15T15:03:17+02:00 Unsupervised Speech Morphing between Utterances of any Speakers Hartmut R. Pfitzinger The Pennsylvania State University CiteSeerX Archives 2004 application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736 http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736 http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf text 2004 ftciteseerx 2016-01-08T14:27:24Z A new approach to speech morphing is presented which avoids the extraction of fundamental and formant frequencies as well as the detection of phone or syllable boundaries. All prominent spectral and temporal features of the source and target utterances are automatically related and interpolated. The method consists of three main parts: LPC-based source-filter decomposition, separate interpolation, and composition of the morphed speech signal. The paper focuses on the alignment and interpolation problems on three speech signal layers: the timing structure on a phone- and syllable-level, the shape of the frequency spectrum including formants and other spectral properties, and the micro-timing of the source signal. Particularly, the source signal alignment and interpolation is described since it is most crucial for the resulting quality of the modified speech signal. The new morphing procedure was applied to utterances taken from the freely available CMU ARCTIC speech corpus and assessed by a perceptual MOS experiment. Preliminary Text Arctic Unknown Arctic
institution Open Polar
collection Unknown
op_collection_id ftciteseerx
language English
description A new approach to speech morphing is presented which avoids the extraction of fundamental and formant frequencies as well as the detection of phone or syllable boundaries. All prominent spectral and temporal features of the source and target utterances are automatically related and interpolated. The method consists of three main parts: LPC-based source-filter decomposition, separate interpolation, and composition of the morphed speech signal. The paper focuses on the alignment and interpolation problems on three speech signal layers: the timing structure on a phone- and syllable-level, the shape of the frequency spectrum including formants and other spectral properties, and the micro-timing of the source signal. Particularly, the source signal alignment and interpolation is described since it is most crucial for the resulting quality of the modified speech signal. The new morphing procedure was applied to utterances taken from the freely available CMU ARCTIC speech corpus and assessed by a perceptual MOS experiment. Preliminary
author2 The Pennsylvania State University CiteSeerX Archives
format Text
author Hartmut R. Pfitzinger
spellingShingle Hartmut R. Pfitzinger
Unsupervised Speech Morphing between Utterances of any Speakers
author_facet Hartmut R. Pfitzinger
author_sort Hartmut R. Pfitzinger
title Unsupervised Speech Morphing between Utterances of any Speakers
title_short Unsupervised Speech Morphing between Utterances of any Speakers
title_full Unsupervised Speech Morphing between Utterances of any Speakers
title_fullStr Unsupervised Speech Morphing between Utterances of any Speakers
title_full_unstemmed Unsupervised Speech Morphing between Utterances of any Speakers
title_sort unsupervised speech morphing between utterances of any speakers
publishDate 2004
url http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736
http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf
geographic Arctic
geographic_facet Arctic
genre Arctic
genre_facet Arctic
op_source http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf
op_relation http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736
http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf
op_rights Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_ 1766335153387339776