Optimizing voice conversion network with cycle consistency loss of speaker identity
We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle consistency loss that constrains the converted speech to maintain the...
Main Authors: | , , , |
---|---|
Format: | Text |
Language: | unknown |
Published: |
2020
|
Subjects: | |
Online Access: | http://arxiv.org/abs/2011.08548 |
id |
ftarxivpreprints:oai:arXiv.org:2011.08548 |
---|---|
record_format |
openpolar |
spelling |
ftarxivpreprints:oai:arXiv.org:2011.08548 2023-09-05T13:17:15+02:00 Optimizing voice conversion network with cycle consistency loss of speaker identity Du, Hongqiang Tian, Xiaohai Xie, Lei Li, Haizhou 2020-11-17 http://arxiv.org/abs/2011.08548 unknown http://arxiv.org/abs/2011.08548 Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing text 2020 ftarxivpreprints 2023-08-16T16:12:11Z We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle consistency loss that constrains the converted speech to maintain the same speaker identity as reference speech at utterance level. While the proposed training scheme is applicable to any voice conversion networks, we formulate the study under the average model voice conversion framework in this paper. Experiments conducted on CMU-ARCTIC and CSTR-VCTK corpus confirm that the proposed method outperforms baseline methods in terms of speaker similarity. Text Arctic ArXiv.org (Cornell University Library) Arctic |
institution |
Open Polar |
collection |
ArXiv.org (Cornell University Library) |
op_collection_id |
ftarxivpreprints |
language |
unknown |
topic |
Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing |
spellingShingle |
Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing Du, Hongqiang Tian, Xiaohai Xie, Lei Li, Haizhou Optimizing voice conversion network with cycle consistency loss of speaker identity |
topic_facet |
Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing |
description |
We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle consistency loss that constrains the converted speech to maintain the same speaker identity as reference speech at utterance level. While the proposed training scheme is applicable to any voice conversion networks, we formulate the study under the average model voice conversion framework in this paper. Experiments conducted on CMU-ARCTIC and CSTR-VCTK corpus confirm that the proposed method outperforms baseline methods in terms of speaker similarity. |
format |
Text |
author |
Du, Hongqiang Tian, Xiaohai Xie, Lei Li, Haizhou |
author_facet |
Du, Hongqiang Tian, Xiaohai Xie, Lei Li, Haizhou |
author_sort |
Du, Hongqiang |
title |
Optimizing voice conversion network with cycle consistency loss of speaker identity |
title_short |
Optimizing voice conversion network with cycle consistency loss of speaker identity |
title_full |
Optimizing voice conversion network with cycle consistency loss of speaker identity |
title_fullStr |
Optimizing voice conversion network with cycle consistency loss of speaker identity |
title_full_unstemmed |
Optimizing voice conversion network with cycle consistency loss of speaker identity |
title_sort |
optimizing voice conversion network with cycle consistency loss of speaker identity |
publishDate |
2020 |
url |
http://arxiv.org/abs/2011.08548 |
geographic |
Arctic |
geographic_facet |
Arctic |
genre |
Arctic |
genre_facet |
Arctic |
op_relation |
http://arxiv.org/abs/2011.08548 |
_version_ |
1776198500347805696 |