Summary: | In this paper, we suggest a novel way to train GenerativeAdversarial Network (GAN) for the purpose of non-parallel,many-to-many voice conversion. The goal of voice conversion(VC) is to transform speech from a source speaker to that of atarget speaker without changing the phonetic contents. Basedon ideas from Game Theory, we suggest to multiply the gradi-ent of the Generator with suitable weights. Weights are calcu-lated so that they increase the power of fake samples that foolthe Discriminator resulting in a stronger Generator. Motivatedby a recently presented GAN based approach for VC, StarGAN-VC, we suggest a variation to StarGAN, referred to as WeightedStarGAN (WeStarGAN). The experiments are conducted onstandard CMU ARCTIC database. WeStarGAN-VC approachachieves significantly better relative performance and is clearlypreferred over recently proposed StarGAN-VC method in termsof speech subjective quality and speaker similarity with 75% and 65%preference scores, respectively.
|