Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features

This paper examines four approaches to improving real-time neural vocoders with simple acoustic features (SAF) constructed from fundamental frequency and mel-cepstra rather than mel-spectrograms. The investigations are as follows: 1) the effectiveness of single Gaussian (SG) autoregressive (AR) WaveNet and FFTNet vocoders with SAF, 2) the possibility of SG parallel WaveNet vocoder training and synthesis with SAF, 3) the impact of noise shaping on SG AR neural vocoders, and 4) the efficacy of bandwidth extension to synthesize speech waveforms at a sampling frequency of 24 kHz by SG AR neural vocoders from SAF for that of 16 kHz. The results of experiments indicate that SG AR WaveNet and real-time SG AR FFTNet vocoders with noise shaping using SAF can realize sufficient synthesis quality with bandwidth extension effect. Moreover, a real-time SG parallel WaveNet vocoder can also be trained using SAF.

https://ieeexplore.ieee.org/document/8682320

Additionally, demo samples synthesized by WaveRNN and WaveGlow vocoders with SAF will be provided in the poster session!!
Paper Code: SLP-P20.13
Session: Speech Synthesis II
Time: Friday, May 17, 08:30 - 10:30

icassp_2019_okamoto_1.pdf

icassp_2019_okamoto_1.pdf (793)

Thumbs Up

CITE

Documents

Poster

Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features

icassp_2019_okamoto_1.pdf

QUESTIONS?