Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features

Abstract: 

Although a WaveNet vocoder can synthesize more natural-sounding speech waveforms than conventional vocoders with sampling frequencies of 16 and 24 kHz, it is difficult to directly extend the sampling frequency to 48 kHz to cover the entire human audible frequency range for higher-quality synthesis because the model size becomes too large to train with a consumer GPU. For a WaveNet vocoder with a sampling frequency of 48 kHz with a consumer GPU, this paper introduces a subband WaveNet architecture to a speaker-dependent WaveNet vocoder and proposes a subband WaveNet vocoder. In experiments, each conditional subband WaveNet with a sampling frequency of 8 kHz was well trained using a consumer GPU. The results of subjective evaluations with a Japanese male speech corpus indicate that the proposed subband WaveNet vocoder with 36-dimensional simple acoustic features significantly outperformed the conventional source-filter model-based vocoders including STRAIGHT with 86-dimensional features.

up
0 users have voted:

Paper Details

Authors:
Takuma Okamoto, Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai
Submitted On:
24 April 2018 - 2:34am
Short Link:
Type:
Poster
Event:
Presenter's Name:
Takuma Okamoto
Paper Code:
SP-P14.2
Document Year:
2018
Cite

Document Files

ICASSP_2018_subband_WaveNet_vocoder.pdf

(160 downloads)

Subscribe

[1] Takuma Okamoto, Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, "An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3162. Accessed: Dec. 18, 2018.
@article{3162-18,
url = {http://sigport.org/3162},
author = {Takuma Okamoto; Kentaro Tachibana; Tomoki Toda; Yoshinori Shiga; Hisashi Kawai },
publisher = {IEEE SigPort},
title = {An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features},
year = {2018} }
TY - EJOUR
T1 - An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features
AU - Takuma Okamoto; Kentaro Tachibana; Tomoki Toda; Yoshinori Shiga; Hisashi Kawai
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3162
ER -
Takuma Okamoto, Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai. (2018). An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features. IEEE SigPort. http://sigport.org/3162
Takuma Okamoto, Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 2018. An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features. Available at: http://sigport.org/3162.
Takuma Okamoto, Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai. (2018). "An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features." Web.
1. Takuma Okamoto, Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai. An investigation of subband WaveNet vocoder covering entire audible frequency range with limited acoustic features [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3162