Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing

Abstract: 

Recently, several papers have demonstrated that neural networks (NN) are able to perform the feature extraction as part of the acoustic model. Motivated by the Gammatone feature extraction pipeline, in this paper we extend the waveform based NN model by a sec- ond level of time-convolutional element. The proposed extension generalizes the envelope extraction block, and allows the model to learn multi-resolutional representations. Automatic speech recognition (ASR) experiments show significant word error rate reduction over our previous best acoustic model trained in the signal domain directly. Although we use only 250 hours of speech, the data-driven NN based speech signal processing performs nearly equally to traditional handcrafted feature extractors. In additional experiments, we also test segment-level feature normalization techniques on NN derived features, which improve the results further. However, the porting of speech representations derived by a feed-forward NN to a LSTM back-end model indicates much less robustness of the NN front-end compared to the standard feature extractors. Analysis of the weights in the proposed new layer reveals that the NN prefers both multi-resolution and modulation spectrum representations.

up
0 users have voted:

Paper Details

Authors:
Zoltán Tüske, Ralf Schlüter, Hermann Ney
Submitted On:
2 May 2018 - 3:00pm
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
Zoltán Tüske
Document Year:
2018
Cite

Document Files

slides-template.pdf

(165 downloads)

Subscribe

[1] Zoltán Tüske, Ralf Schlüter, Hermann Ney, "Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3199. Accessed: Nov. 16, 2018.
@article{3199-18,
url = {http://sigport.org/3199},
author = {Zoltán Tüske; Ralf Schlüter; Hermann Ney },
publisher = {IEEE SigPort},
title = {Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing},
year = {2018} }
TY - EJOUR
T1 - Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing
AU - Zoltán Tüske; Ralf Schlüter; Hermann Ney
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3199
ER -
Zoltán Tüske, Ralf Schlüter, Hermann Ney. (2018). Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing. IEEE SigPort. http://sigport.org/3199
Zoltán Tüske, Ralf Schlüter, Hermann Ney, 2018. Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing. Available at: http://sigport.org/3199.
Zoltán Tüske, Ralf Schlüter, Hermann Ney. (2018). "Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing." Web.
1. Zoltán Tüske, Ralf Schlüter, Hermann Ney. Acoustic modeling of speech waveform based on multi-resolution, neural network signal processing [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3199