Optimizing voice conversion network with cycle consistency loss of speaker identity

We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle consistency loss that constrains the converted speech to maintain the...

Full description

Bibliographic Details
Main Authors: Du, Hongqiang, Tian, Xiaohai, Xie, Lei, Li, Haizhou
Format: Text
Language:unknown
Published: 2020
Subjects:
Online Access:http://arxiv.org/abs/2011.08548
id ftarxivpreprints:oai:arXiv.org:2011.08548
record_format openpolar
spelling ftarxivpreprints:oai:arXiv.org:2011.08548 2023-09-05T13:17:15+02:00 Optimizing voice conversion network with cycle consistency loss of speaker identity Du, Hongqiang Tian, Xiaohai Xie, Lei Li, Haizhou 2020-11-17 http://arxiv.org/abs/2011.08548 unknown http://arxiv.org/abs/2011.08548 Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing text 2020 ftarxivpreprints 2023-08-16T16:12:11Z We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle consistency loss that constrains the converted speech to maintain the same speaker identity as reference speech at utterance level. While the proposed training scheme is applicable to any voice conversion networks, we formulate the study under the average model voice conversion framework in this paper. Experiments conducted on CMU-ARCTIC and CSTR-VCTK corpus confirm that the proposed method outperforms baseline methods in terms of speaker similarity. Text Arctic ArXiv.org (Cornell University Library) Arctic
institution Open Polar
collection ArXiv.org (Cornell University Library)
op_collection_id ftarxivpreprints
language unknown
topic Computer Science - Sound
Electrical Engineering and Systems Science - Audio and Speech Processing
spellingShingle Computer Science - Sound
Electrical Engineering and Systems Science - Audio and Speech Processing
Du, Hongqiang
Tian, Xiaohai
Xie, Lei
Li, Haizhou
Optimizing voice conversion network with cycle consistency loss of speaker identity
topic_facet Computer Science - Sound
Electrical Engineering and Systems Science - Audio and Speech Processing
description We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle consistency loss that constrains the converted speech to maintain the same speaker identity as reference speech at utterance level. While the proposed training scheme is applicable to any voice conversion networks, we formulate the study under the average model voice conversion framework in this paper. Experiments conducted on CMU-ARCTIC and CSTR-VCTK corpus confirm that the proposed method outperforms baseline methods in terms of speaker similarity.
format Text
author Du, Hongqiang
Tian, Xiaohai
Xie, Lei
Li, Haizhou
author_facet Du, Hongqiang
Tian, Xiaohai
Xie, Lei
Li, Haizhou
author_sort Du, Hongqiang
title Optimizing voice conversion network with cycle consistency loss of speaker identity
title_short Optimizing voice conversion network with cycle consistency loss of speaker identity
title_full Optimizing voice conversion network with cycle consistency loss of speaker identity
title_fullStr Optimizing voice conversion network with cycle consistency loss of speaker identity
title_full_unstemmed Optimizing voice conversion network with cycle consistency loss of speaker identity
title_sort optimizing voice conversion network with cycle consistency loss of speaker identity
publishDate 2020
url http://arxiv.org/abs/2011.08548
geographic Arctic
geographic_facet Arctic
genre Arctic
genre_facet Arctic
op_relation http://arxiv.org/abs/2011.08548
_version_ 1776198500347805696