Unsupervised Speech Morphing between Utterances of any Speakers
A new approach to speech morphing is presented which avoids the extraction of fundamental and formant frequencies as well as the detection of phone or syllable boundaries. All prominent spectral and temporal features of the source and target utterances are automatically related and interpolated. The...
Main Author: | |
---|---|
Other Authors: | |
Format: | Text |
Language: | English |
Published: |
2004
|
Subjects: | |
Online Access: | http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736 http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf |
id |
ftciteseerx:oai:CiteSeerX.psu:10.1.1.61.8736 |
---|---|
record_format |
openpolar |
spelling |
ftciteseerx:oai:CiteSeerX.psu:10.1.1.61.8736 2023-05-15T15:03:17+02:00 Unsupervised Speech Morphing between Utterances of any Speakers Hartmut R. Pfitzinger The Pennsylvania State University CiteSeerX Archives 2004 application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736 http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736 http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf text 2004 ftciteseerx 2016-01-08T14:27:24Z A new approach to speech morphing is presented which avoids the extraction of fundamental and formant frequencies as well as the detection of phone or syllable boundaries. All prominent spectral and temporal features of the source and target utterances are automatically related and interpolated. The method consists of three main parts: LPC-based source-filter decomposition, separate interpolation, and composition of the morphed speech signal. The paper focuses on the alignment and interpolation problems on three speech signal layers: the timing structure on a phone- and syllable-level, the shape of the frequency spectrum including formants and other spectral properties, and the micro-timing of the source signal. Particularly, the source signal alignment and interpolation is described since it is most crucial for the resulting quality of the modified speech signal. The new morphing procedure was applied to utterances taken from the freely available CMU ARCTIC speech corpus and assessed by a perceptual MOS experiment. Preliminary Text Arctic Unknown Arctic |
institution |
Open Polar |
collection |
Unknown |
op_collection_id |
ftciteseerx |
language |
English |
description |
A new approach to speech morphing is presented which avoids the extraction of fundamental and formant frequencies as well as the detection of phone or syllable boundaries. All prominent spectral and temporal features of the source and target utterances are automatically related and interpolated. The method consists of three main parts: LPC-based source-filter decomposition, separate interpolation, and composition of the morphed speech signal. The paper focuses on the alignment and interpolation problems on three speech signal layers: the timing structure on a phone- and syllable-level, the shape of the frequency spectrum including formants and other spectral properties, and the micro-timing of the source signal. Particularly, the source signal alignment and interpolation is described since it is most crucial for the resulting quality of the modified speech signal. The new morphing procedure was applied to utterances taken from the freely available CMU ARCTIC speech corpus and assessed by a perceptual MOS experiment. Preliminary |
author2 |
The Pennsylvania State University CiteSeerX Archives |
format |
Text |
author |
Hartmut R. Pfitzinger |
spellingShingle |
Hartmut R. Pfitzinger Unsupervised Speech Morphing between Utterances of any Speakers |
author_facet |
Hartmut R. Pfitzinger |
author_sort |
Hartmut R. Pfitzinger |
title |
Unsupervised Speech Morphing between Utterances of any Speakers |
title_short |
Unsupervised Speech Morphing between Utterances of any Speakers |
title_full |
Unsupervised Speech Morphing between Utterances of any Speakers |
title_fullStr |
Unsupervised Speech Morphing between Utterances of any Speakers |
title_full_unstemmed |
Unsupervised Speech Morphing between Utterances of any Speakers |
title_sort |
unsupervised speech morphing between utterances of any speakers |
publishDate |
2004 |
url |
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736 http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf |
geographic |
Arctic |
geographic_facet |
Arctic |
genre |
Arctic |
genre_facet |
Arctic |
op_source |
http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf |
op_relation |
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8736 http://www.phonetik.uni-muenchen.de/~hpt/pub/Pfitzinger_SST04.pdf |
op_rights |
Metadata may be used without restrictions as long as the oai identifier remains attached to it. |
_version_ |
1766335153387339776 |