Mixture of Factor Analyzers Using Priors from Non-Parallel Speech for Voice Conversion

Abstract—A robust voice conversion function relies on a large amount of parallel training data, which is difficult to collect in practice. To tackle the sparse parallel training data problem in voice conversion, this paper describes a mixture of factor analyzers method which integrates prior knowled...

Full description

Bibliographic Details
Main Authors: Zhizheng Wu, Tomi Kinnunen, Eng Siong Chng, Senior Member, Haizhou Li
Other Authors: The Pennsylvania State University CiteSeerX Archives
Format: Text
Language:English
Subjects:
Online Access:http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.304.6118
http://cs.joensuu.fi/pages/tkinnu/webpage/pdf/TMFA_IEEESPL_2012.pdf
Description
Summary:Abstract—A robust voice conversion function relies on a large amount of parallel training data, which is difficult to collect in practice. To tackle the sparse parallel training data problem in voice conversion, this paper describes a mixture of factor analyzers method which integrates prior knowledge from nonparallel speech into the training of conversion function. The experiments on CMU ARCTIC corpus show that the proposed method improves the quality and similarity of converted speech. With both objective and subjective evaluations, we show the proposed method outperforms the baseline GMM method. Index Terms—Voice conversion, prior knowledge, factor analysis, mixture of factor analyzers. I.