Documents
Poster
Investigations of real-time Gaussian FFTNet and parallel WaveNet neural vocoders with simple acoustic features
- Citation Author(s):
- Submitted by:
- Takuma Okamoto
- Last updated:
- 10 May 2019 - 9:39pm
- Document Type:
- Poster
- Document Year:
- 2019
- Event:
- Presenters:
- Takuma Okamoto
- Paper Code:
- SLP-P20.13
- Categories:
- Log in to post comments
This paper examines four approaches to improving real-time neural vocoders with simple acoustic features (SAF) constructed from fundamental frequency and mel-cepstra rather than mel-spectrograms. The investigations are as follows: 1) the effectiveness of single Gaussian (SG) autoregressive (AR) WaveNet and FFTNet vocoders with SAF, 2) the possibility of SG parallel WaveNet vocoder training and synthesis with SAF, 3) the impact of noise shaping on SG AR neural vocoders, and 4) the efficacy of bandwidth extension to synthesize speech waveforms at a sampling frequency of 24 kHz by SG AR neural vocoders from SAF for that of 16 kHz. The results of experiments indicate that SG AR WaveNet and real-time SG AR FFTNet vocoders with noise shaping using SAF can realize sufficient synthesis quality with bandwidth extension effect. Moreover, a real-time SG parallel WaveNet vocoder can also be trained using SAF.
https://ieeexplore.ieee.org/document/8682320
Additionally, demo samples synthesized by WaveRNN and WaveGlow vocoders with SAF will be provided in the poster session!!
Paper Code: SLP-P20.13
Session: Speech Synthesis II
Time: Friday, May 17, 08:30 - 10:30