Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

An Investigation of Noise Shaping with Perceptual Weighting for WaveNet-based Speech Generation

Abstract: 

We propose a noise shaping method to improve the sound quality of speech signals generated by WaveNet, which is a convolutional neural network (CNN) that predicts a waveform sample sequence as a discrete symbol sequence. Speech signals generated by WaveNet often suffer from noise signals caused by the quantization error generated by representing waveform samples as discrete symbols and the prediction error of the CNN. We analyze these noise signals and show that 1) since the prediction error is much larger than the quantization error, the effect of the quantization error on the noise signals is practically negligible, and 2) noise signals tend to cause large spectral distortion in a high-frequency band. To alleviate the adverse effect of these noise signals on the generated speech signals, the proposed noise shaping method applies a perceptual weighting filter to WaveNet, making it possible to use the frequency masking properties of the human auditory system. We conducted objective and subjective evaluations to investigate the effectiveness of the proposed method and demonstrated that it significantly improved the sound quality of the generated speech signals.

up
0 users have voted:

Paper Details

Authors:
Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai
Submitted On:
15 April 2018 - 1:02am
Short Link:
Type:
Poster
Event:
Presenter's Name:
Kentaro Tachibana and Tomoki Toda
Paper Code:
SP-P14.4
Document Year:
2018
Cite

Document Files

Poster pdf

(545)

Subscribe

[1] Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, "An Investigation of Noise Shaping with Perceptual Weighting for WaveNet-based Speech Generation", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2880. Accessed: Aug. 10, 2020.
@article{2880-18,
url = {http://sigport.org/2880},
author = {Kentaro Tachibana; Tomoki Toda; Yoshinori Shiga; Hisashi Kawai },
publisher = {IEEE SigPort},
title = {An Investigation of Noise Shaping with Perceptual Weighting for WaveNet-based Speech Generation},
year = {2018} }
TY - EJOUR
T1 - An Investigation of Noise Shaping with Perceptual Weighting for WaveNet-based Speech Generation
AU - Kentaro Tachibana; Tomoki Toda; Yoshinori Shiga; Hisashi Kawai
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2880
ER -
Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai. (2018). An Investigation of Noise Shaping with Perceptual Weighting for WaveNet-based Speech Generation. IEEE SigPort. http://sigport.org/2880
Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai, 2018. An Investigation of Noise Shaping with Perceptual Weighting for WaveNet-based Speech Generation. Available at: http://sigport.org/2880.
Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai. (2018). "An Investigation of Noise Shaping with Perceptual Weighting for WaveNet-based Speech Generation." Web.
1. Kentaro Tachibana, Tomoki Toda, Yoshinori Shiga, Hisashi Kawai. An Investigation of Noise Shaping with Perceptual Weighting for WaveNet-based Speech Generation [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2880