Sorry, you need to enable JavaScript to visit this website.

An Investigation of Noise Shaping with Perceptual Weighting for WaveNet-based Speech Generation

Citation Author(s):
Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai
Submitted by:
Tomoki Toda
Last updated:
15 April 2018 - 1:02am
Document Type:
Poster
Document Year:
2018
Event:
Presenters:
Kentaro Tachibana and Tomoki Toda
Paper Code:
SP-P14.4
 

We propose a noise shaping method to improve the sound quality of speech signals generated by WaveNet, which is a convolutional neural network (CNN) that predicts a waveform sample sequence as a discrete symbol sequence. Speech signals generated by WaveNet often suffer from noise signals caused by the quantization error generated by representing waveform samples as discrete symbols and the prediction error of the CNN. We analyze these noise signals and show that 1) since the prediction error is much larger than the quantization error, the effect of the quantization error on the noise signals is practically negligible, and 2) noise signals tend to cause large spectral distortion in a high-frequency band. To alleviate the adverse effect of these noise signals on the generated speech signals, the proposed noise shaping method applies a perceptual weighting filter to WaveNet, making it possible to use the frequency masking properties of the human auditory system. We conducted objective and subjective evaluations to investigate the effectiveness of the proposed method and demonstrated that it significantly improved the sound quality of the generated speech signals.

up
0 users have voted: