Documents
Poster
An Investigation of Noise Shaping with Perceptual Weighting for WaveNet-based Speech Generation
- Citation Author(s):
- Submitted by:
- Tomoki Toda
- Last updated:
- 15 April 2018 - 1:02am
- Document Type:
- Poster
- Document Year:
- 2018
- Event:
- Presenters:
- Kentaro Tachibana and Tomoki Toda
- Paper Code:
- SP-P14.4
- Categories:
- Log in to post comments
We propose a noise shaping method to improve the sound quality of speech signals generated by WaveNet, which is a convolutional neural network (CNN) that predicts a waveform sample sequence as a discrete symbol sequence. Speech signals generated by WaveNet often suffer from noise signals caused by the quantization error generated by representing waveform samples as discrete symbols and the prediction error of the CNN. We analyze these noise signals and show that 1) since the prediction error is much larger than the quantization error, the effect of the quantization error on the noise signals is practically negligible, and 2) noise signals tend to cause large spectral distortion in a high-frequency band. To alleviate the adverse effect of these noise signals on the generated speech signals, the proposed noise shaping method applies a perceptual weighting filter to WaveNet, making it possible to use the frequency masking properties of the human auditory system. We conducted objective and subjective evaluations to investigate the effectiveness of the proposed method and demonstrated that it significantly improved the sound quality of the generated speech signals.