Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion

Abstract: 

Non-parallel voice conversion (VC) is a technique for learning the mapping from source to target speech without relying on parallel data. This is an important task, but it has been challenging due to the disadvantages of the training conditions. Recently, CycleGAN-VC has provided a breakthrough and performed comparably to a parallel VC method without relying on any extra data, modules, or time alignment procedures. However, there is still a large gap between the real target and converted speech, and bridging this gap remains a challenge. To reduce the gap, we propose CycleGAN-VC2, which is an improved version of CycleGAN-VC incorporating three new techniques: an improved objective (two-step adversarial losses), improved generator (2-1-2D CNN), and improved discriminator (PatchGAN). We evaluated our method on a non-parallel VC task and analyzed the effect of each technique in detail. An objective evaluation showed that these techniques help bring the converted feature sequence closer to the target in terms of both global and local structures, which we assess by using Mel-cepstral distortion and modulation spectra distance, respectively. A subjective evaluation showed that CycleGAN-VC2 outperforms CycleGAN-VC in terms of naturalness and similarity for every speaker pair, including intra-gender and inter-gender pairs. The converted speech samples are provided at http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/cyclegan-vc2/i....

up
0 users have voted:

Paper Details

Authors:
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo
Submitted On:
10 May 2019 - 2:59am
Short Link:
Type:
Poster
Event:
Presenter's Name:
Takuhiro Kaneko
Paper Code:
SLP-P18.8
Document Year:
2019
Cite

Document Files

Kaneko_CycleGAN-VC2_ICASSP_2019_poster.pdf

(14)

Subscribe

[1] Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo, "CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4278. Accessed: Jun. 19, 2019.
@article{4278-19,
url = {http://sigport.org/4278},
author = {Takuhiro Kaneko; Hirokazu Kameoka; Kou Tanaka; Nobukatsu Hojo },
publisher = {IEEE SigPort},
title = {CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion},
year = {2019} }
TY - EJOUR
T1 - CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion
AU - Takuhiro Kaneko; Hirokazu Kameoka; Kou Tanaka; Nobukatsu Hojo
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4278
ER -
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo. (2019). CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion. IEEE SigPort. http://sigport.org/4278
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo, 2019. CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion. Available at: http://sigport.org/4278.
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo. (2019). "CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion." Web.
1. Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Nobukatsu Hojo. CycleGAN-VC2: Improved CycleGAN-based Non-parallel Voice Conversion [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4278