Sorry, you need to enable JavaScript to visit this website.

SPEECH AUGMENTATION USING WAVENET IN SPEECH RECOGNITION

Citation Author(s):
Sangki Kim, Yeha Lee
Submitted by:
Jisung Wang
Last updated:
10 May 2019 - 9:55am
Document Type:
Poster
Event:
Presenters:
Jisung Wang
Paper Code:
SLP-P17.9
 

Data augmentation is crucial to improving the performance of deep neural networks by helping the model avoid overfitting and improve its generalization. In automatic speech recognition, previous work proposed several approaches to augment data by performing speed perturbation or spectral transformation. Since data augmented in these manners has similar acoustic representations with the original data, it has limited advantage in improving generalization of the acoustic model. In order to avoid generating data with limited diversity, we propose a voice conversion approach using a generative model (WaveNet), which generates a new utterance by transforming an utterance to a given target voice. Our method synthesizes speech with diverse pitch patterns by minimizing the use of acoustic features. With the Wall Street Journal dataset, we verify that our method led to better generalization compared to other data augmentation techniques such as speed perturbation and WORLD-based voice conversion. In addition, when combined with the speed perturbation technique, the two methods complement each other to further improve performance of the acoustic model.

up
0 users have voted: