Non-Parallel Voice Conversion Using Weighted Generative Adversarial Networks

In this paper, we suggest a novel way to train GenerativeAdversarial Network (GAN) for the purpose of non-parallel,many-to-many voice conversion. The goal of voice conversion(VC) is to transform speech from a source speaker to that of atarget speaker without changing the phonetic contents. Basedon i...

Full description

Bibliographic Details
Published in:	Interspeech 2019
Main Authors:	Paul, Dipjyoti, Pantazis, Yannis, Stylianou, Yannis
Format:	Conference Object
Language:	English
Published:	Zenodo 2019
Subjects:	Voice conversion generative adversarial net-works training algorithm Arctic
Online Access:	https://doi.org/10.21437/Interspeech.2019-2869

Description
Summary:	In this paper, we suggest a novel way to train GenerativeAdversarial Network (GAN) for the purpose of non-parallel,many-to-many voice conversion. The goal of voice conversion(VC) is to transform speech from a source speaker to that of atarget speaker without changing the phonetic contents. Basedon ideas from Game Theory, we suggest to multiply the gradi-ent of the Generator with suitable weights. Weights are calcu-lated so that they increase the power of fake samples that foolthe Discriminator resulting in a stronger Generator. Motivatedby a recently presented GAN based approach for VC, StarGAN-VC, we suggest a variation to StarGAN, referred to as WeightedStarGAN (WeStarGAN). The experiments are conducted onstandard CMU ARCTIC database. WeStarGAN-VC approachachieves significantly better relative performance and is clearlypreferred over recently proposed StarGAN-VC method in termsof speech subjective quality and speaker similarity with 75% and 65%preference scores, respectively.

Non-Parallel Voice Conversion Using Weighted Generative Adversarial Networks

Similar Items