Sorry, you need to enable JavaScript to visit this website.

Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features

Primary tabs

Citation Author(s):
Takuma Okamoto, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai
Submitted by:
Takuma Okamoto
Last updated:
10 May 2019 - 9:39pm
Document Type:
Document Year:
Presenters Name:
Takuma Okamoto
Paper Code:



This paper examines four approaches to improving real-time neural vocoders with simple acoustic features (SAF) constructed from fundamental frequency and mel-cepstra rather than mel-spectrograms. The investigations are as follows: 1) the effectiveness of single Gaussian (SG) autoregressive (AR) WaveNet and FFTNet vocoders with SAF, 2) the possibility of SG parallel WaveNet vocoder training and synthesis with SAF, 3) the impact of noise shaping on SG AR neural vocoders, and 4) the efficacy of bandwidth extension to synthesize speech waveforms at a sampling frequency of 24 kHz by SG AR neural vocoders from SAF for that of 16 kHz. The results of experiments indicate that SG AR WaveNet and real-time SG AR FFTNet vocoders with noise shaping using SAF can realize sufficient synthesis quality with bandwidth extension effect. Moreover, a real-time SG parallel WaveNet vocoder can also be trained using SAF.

Additionally, demo samples synthesized by WaveRNN and WaveGlow vocoders with SAF will be provided in the poster session!!
Paper Code: SLP-P20.13
Session: Speech Synthesis II
Time: Friday, May 17, 08:30 - 10:30

0 users have voted:

Dataset Files