Optimizing voice conversion network with cycle consistency loss of speaker identity

We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle consistency loss that constrains the converted speech to maintain the...

Full description

Bibliographic Details
Main Authors:	Du, Hongqiang, Tian, Xiaohai, Xie, Lei, Li, Haizhou
Format:	Text
Language:	unknown
Published:	2020
Subjects:	Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing Arctic
Online Access:	http://arxiv.org/abs/2011.08548

id	ftarxivpreprints:oai:arXiv.org:2011.08548
record_format	openpolar
spelling	ftarxivpreprints:oai:arXiv.org:2011.08548 2023-09-05T13:17:15+02:00 Optimizing voice conversion network with cycle consistency loss of speaker identity Du, Hongqiang Tian, Xiaohai Xie, Lei Li, Haizhou 2020-11-17 http://arxiv.org/abs/2011.08548 unknown http://arxiv.org/abs/2011.08548 Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing text 2020 ftarxivpreprints 2023-08-16T16:12:11Z We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle consistency loss that constrains the converted speech to maintain the same speaker identity as reference speech at utterance level. While the proposed training scheme is applicable to any voice conversion networks, we formulate the study under the average model voice conversion framework in this paper. Experiments conducted on CMU-ARCTIC and CSTR-VCTK corpus confirm that the proposed method outperforms baseline methods in terms of speaker similarity. Text Arctic ArXiv.org (Cornell University Library) Arctic
institution	Open Polar
collection	ArXiv.org (Cornell University Library)
op_collection_id	ftarxivpreprints
language	unknown
topic	Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
spellingShingle	Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing Du, Hongqiang Tian, Xiaohai Xie, Lei Li, Haizhou Optimizing voice conversion network with cycle consistency loss of speaker identity
topic_facet	Computer Science - Sound Electrical Engineering and Systems Science - Audio and Speech Processing
description	We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle consistency loss that constrains the converted speech to maintain the same speaker identity as reference speech at utterance level. While the proposed training scheme is applicable to any voice conversion networks, we formulate the study under the average model voice conversion framework in this paper. Experiments conducted on CMU-ARCTIC and CSTR-VCTK corpus confirm that the proposed method outperforms baseline methods in terms of speaker similarity.
format	Text
author	Du, Hongqiang Tian, Xiaohai Xie, Lei Li, Haizhou
author_facet	Du, Hongqiang Tian, Xiaohai Xie, Lei Li, Haizhou
author_sort	Du, Hongqiang
title	Optimizing voice conversion network with cycle consistency loss of speaker identity
title_short	Optimizing voice conversion network with cycle consistency loss of speaker identity
title_full	Optimizing voice conversion network with cycle consistency loss of speaker identity
title_fullStr	Optimizing voice conversion network with cycle consistency loss of speaker identity
title_full_unstemmed	Optimizing voice conversion network with cycle consistency loss of speaker identity
title_sort	optimizing voice conversion network with cycle consistency loss of speaker identity
publishDate	2020
url	http://arxiv.org/abs/2011.08548
geographic	Arctic
geographic_facet	Arctic
genre	Arctic
genre_facet	Arctic
op_relation	http://arxiv.org/abs/2011.08548
_version_	1776198500347805696

Optimizing voice conversion network with cycle consistency loss of speaker identity

Similar Items