Optimizing voice conversion network with cycle consistency loss of speaker identity

We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle consistency loss that constrains the converted speech to maintain the...

Full description

Bibliographic Details
Main Authors:	Du, Hongqiang, Tian, Xiaohai, Xie, Lei, Li, Haizhou
Format:	Article in Journal/Newspaper
Language:	unknown
Published:	arXiv 2020
Subjects:	Sound cs.SD Audio and Speech Processing eess.AS FOS Computer and information sciences FOS Electrical engineering, electronic engineering, information engineering Arctic
Online Access:	https://dx.doi.org/10.48550/arxiv.2011.08548 https://arxiv.org/abs/2011.08548

Description
Summary:	We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle consistency loss that constrains the converted speech to maintain the same speaker identity as reference speech at utterance level. While the proposed training scheme is applicable to any voice conversion networks, we formulate the study under the average model voice conversion framework in this paper. Experiments conducted on CMU-ARCTIC and CSTR-VCTK corpus confirm that the proposed method outperforms baseline methods in terms of speaker similarity.

Optimizing voice conversion network with cycle consistency loss of speaker identity

Similar Items