An Investigation of Noise Shaping with Perceptual Weighting for WaveNet-based Speech Generation

We propose a noise shaping method to improve the sound quality of speech signals generated by WaveNet, which is a convolutional neural network (CNN) that predicts a waveform sample sequence as a discrete symbol sequence. Speech signals generated by WaveNet often suffer from noise signals caused by the quantization error generated by representing waveform samples as discrete symbols and the prediction error of the CNN. We analyze these noise signals and show that 1) since the prediction error is much larger than the quantization error, the effect of the quantization error on the noise signals is practically negligible, and 2) noise signals tend to cause large spectral distortion in a high-frequency band. To alleviate the adverse effect of these noise signals on the generated speech signals, the proposed noise shaping method applies a perceptual weighting filter to WaveNet, making it possible to use the frequency masking properties of the human auditory system. We conducted objective and subjective evaluations to investigate the effectiveness of the proposed method and demonstrated that it significantly improved the sound quality of the generated speech signals.

ICASSP2018_NS.pdf

Poster pdf (1113)

Thumbs Up

CITE

Documents

Poster

An Investigation of Noise Shaping with Perceptual Weighting for WaveNet-based Speech Generation

ICASSP2018_NS.pdf

QUESTIONS?