Unsupervised Speech Morphing between Utterances of any Speakers

A new approach to speech morphing is presented which avoids the extraction of fundamental and formant frequencies as well as the detection of phone or syllable boundaries. All prominent spectral and temporal features of the source and target utterances are automatically related and interpolated. The...

Full description

Bibliographic Details
Main Author:	Hartmut R. Pfitzinger
Other Authors:	The Pennsylvania State University CiteSeerX Archives
Format:	Text
Language:	English
Published:	2004
Subjects:	Arctic
Online Access:	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736 http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf

id	ftciteseerx:oai:CiteSeerX.psu:10.1.1.61.8736
record_format	openpolar
spelling	ftciteseerx:oai:CiteSeerX.psu:10.1.1.61.8736 2023-05-15T15:03:17+02:00 Unsupervised Speech Morphing between Utterances of any Speakers Hartmut R. Pfitzinger The Pennsylvania State University CiteSeerX Archives 2004 application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736 http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736 http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf text 2004 ftciteseerx 2016-01-08T14:27:24Z A new approach to speech morphing is presented which avoids the extraction of fundamental and formant frequencies as well as the detection of phone or syllable boundaries. All prominent spectral and temporal features of the source and target utterances are automatically related and interpolated. The method consists of three main parts: LPC-based source-filter decomposition, separate interpolation, and composition of the morphed speech signal. The paper focuses on the alignment and interpolation problems on three speech signal layers: the timing structure on a phone- and syllable-level, the shape of the frequency spectrum including formants and other spectral properties, and the micro-timing of the source signal. Particularly, the source signal alignment and interpolation is described since it is most crucial for the resulting quality of the modified speech signal. The new morphing procedure was applied to utterances taken from the freely available CMU ARCTIC speech corpus and assessed by a perceptual MOS experiment. Preliminary Text Arctic Unknown Arctic
institution	Open Polar
collection	Unknown
op_collection_id	ftciteseerx
language	English
description	A new approach to speech morphing is presented which avoids the extraction of fundamental and formant frequencies as well as the detection of phone or syllable boundaries. All prominent spectral and temporal features of the source and target utterances are automatically related and interpolated. The method consists of three main parts: LPC-based source-filter decomposition, separate interpolation, and composition of the morphed speech signal. The paper focuses on the alignment and interpolation problems on three speech signal layers: the timing structure on a phone- and syllable-level, the shape of the frequency spectrum including formants and other spectral properties, and the micro-timing of the source signal. Particularly, the source signal alignment and interpolation is described since it is most crucial for the resulting quality of the modified speech signal. The new morphing procedure was applied to utterances taken from the freely available CMU ARCTIC speech corpus and assessed by a perceptual MOS experiment. Preliminary
author2	The Pennsylvania State University CiteSeerX Archives
format	Text
author	Hartmut R. Pfitzinger
spellingShingle	Hartmut R. Pfitzinger Unsupervised Speech Morphing between Utterances of any Speakers
author_facet	Hartmut R. Pfitzinger
author_sort	Hartmut R. Pfitzinger
title	Unsupervised Speech Morphing between Utterances of any Speakers
title_short	Unsupervised Speech Morphing between Utterances of any Speakers
title_full	Unsupervised Speech Morphing between Utterances of any Speakers
title_fullStr	Unsupervised Speech Morphing between Utterances of any Speakers
title_full_unstemmed	Unsupervised Speech Morphing between Utterances of any Speakers
title_sort	unsupervised speech morphing between utterances of any speakers
publishDate	2004
url	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736 http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf
geographic	Arctic
geographic_facet	Arctic
genre	Arctic
genre_facet	Arctic
op_source	http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf
op_relation	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736 http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf
op_rights	Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_	1766335153387339776

Unsupervised Speech Morphing between Utterances of any Speakers

Similar Items