Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion

In voice conversion, frame-level mean and variance normalization is typically used for fundamental frequency (F0) transformation, which is text-independent and requires no parallel training data. Some advanced methods transform pitch contours instead, but require either parallel training data or syl...

Full description

Bibliographic Details
Main Authors:	Zhi-zheng Wu, Tomi Kinnunen, Eng Siong Chng, Haizhou Li
Other Authors:	The Pennsylvania State University CiteSeerX Archives
Format:	Text
Language:	English
Subjects:	Index Terms Voice conversion F0 transformation GMM histogram Arctic
Online Access:	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.178.6294 http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/IS2010_ProsodyConversion.pdf

id	ftciteseerx:oai:CiteSeerX.psu:10.1.1.178.6294
record_format	openpolar
spelling	ftciteseerx:oai:CiteSeerX.psu:10.1.1.178.6294 2023-05-15T15:02:38+02:00 Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion Zhi-zheng Wu Tomi Kinnunen Eng Siong Chng Haizhou Li The Pennsylvania State University CiteSeerX Archives application/pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.178.6294 http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/IS2010_ProsodyConversion.pdf en eng http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.178.6294 http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/IS2010_ProsodyConversion.pdf Metadata may be used without restrictions as long as the oai identifier remains attached to it. http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/IS2010_ProsodyConversion.pdf Index Terms Voice conversion F0 transformation GMM histogram text ftciteseerx 2016-01-07T16:21:01Z In voice conversion, frame-level mean and variance normalization is typically used for fundamental frequency (F0) transformation, which is text-independent and requires no parallel training data. Some advanced methods transform pitch contours instead, but require either parallel training data or syllabic annotations. We propose a method which retains the simplicity and text-independence of the frame-level conversion while yielding high-quality conversion. We achieve these goals by (1) introducing a text-independent tri-frame alignment method, (2) including delta features of F0 into Gaussian mixture model (GMM) conversion and (3) reducing the well-known GMM oversmoothing effect by F0 histogram equalization. Our objective and subjective experiments on the CMU Arctic corpus indicate improvements over both the mean/variance normalization and the baseline GMM conversion. Text Arctic Unknown Arctic
institution	Open Polar
collection	Unknown
op_collection_id	ftciteseerx
language	English
topic	Index Terms Voice conversion F0 transformation GMM histogram
spellingShingle	Index Terms Voice conversion F0 transformation GMM histogram Zhi-zheng Wu Tomi Kinnunen Eng Siong Chng Haizhou Li Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion
topic_facet	Index Terms Voice conversion F0 transformation GMM histogram
description	In voice conversion, frame-level mean and variance normalization is typically used for fundamental frequency (F0) transformation, which is text-independent and requires no parallel training data. Some advanced methods transform pitch contours instead, but require either parallel training data or syllabic annotations. We propose a method which retains the simplicity and text-independence of the frame-level conversion while yielding high-quality conversion. We achieve these goals by (1) introducing a text-independent tri-frame alignment method, (2) including delta features of F0 into Gaussian mixture model (GMM) conversion and (3) reducing the well-known GMM oversmoothing effect by F0 histogram equalization. Our objective and subjective experiments on the CMU Arctic corpus indicate improvements over both the mean/variance normalization and the baseline GMM conversion.
author2	The Pennsylvania State University CiteSeerX Archives
format	Text
author	Zhi-zheng Wu Tomi Kinnunen Eng Siong Chng Haizhou Li
author_facet	Zhi-zheng Wu Tomi Kinnunen Eng Siong Chng Haizhou Li
author_sort	Zhi-zheng Wu
title	Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion
title_short	Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion
title_full	Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion
title_fullStr	Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion
title_full_unstemmed	Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion
title_sort	text-independent f0 transformation with non-parallel data for voice conversion
url	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.178.6294 http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/IS2010_ProsodyConversion.pdf
geographic	Arctic
geographic_facet	Arctic
genre	Arctic
genre_facet	Arctic
op_source	http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/IS2010_ProsodyConversion.pdf
op_relation	http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.178.6294 http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/IS2010_ProsodyConversion.pdf
op_rights	Metadata may be used without restrictions as long as the oai identifier remains attached to it.
_version_	1766334558071947264

Text-Independent F0 Transformation with Non-Parallel Data for Voice Conversion

Similar Items