Documents
Poster
An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features
- Citation Author(s):
- Submitted by:
- Takuma Okamoto
- Last updated:
- 24 April 2018 - 2:34am
- Document Type:
- Poster
- Document Year:
- 2018
- Event:
- Presenters:
- Takuma Okamoto
- Paper Code:
- SP-P14.2
- Categories:
- Log in to post comments
Although a WaveNet vocoder can synthesize more natural-sounding speech waveforms than conventional vocoders with sampling frequencies of 16 and 24 kHz, it is difficult to directly extend the sampling frequency to 48 kHz to cover the entire human audible frequency range for higher-quality synthesis because the model size becomes too large to train with a consumer GPU. For a WaveNet vocoder with a sampling frequency of 48 kHz with a consumer GPU, this paper introduces a subband WaveNet architecture to a speaker-dependent WaveNet vocoder and proposes a subband WaveNet vocoder. In experiments, each conditional subband WaveNet with a sampling frequency of 8 kHz was well trained using a consumer GPU. The results of subjective evaluations with a Japanese male speech corpus indicate that the proposed subband WaveNet vocoder with 36-dimensional simple acoustic features significantly outperformed the conventional source-filter model-based vocoders including STRAIGHT with 86-dimensional features.