Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

On Training the Recurrent Neural Network Encoder-Decoder for Large Vocabulary End-to-end Speech Recognition

Abstract: 

Recently, there has been an increasing interest in end-to-end speech
recognition using neural networks, with no reliance on hidden
Markov models (HMMs) for sequence modelling as in the standard
hybrid framework. The recurrent neural network (RNN) encoder-decoder
is such a model, performing sequence to sequence mapping
without any predefined alignment. This model first transforms the
input sequence into a fixed length vector representation, from which
the decoder recovers the output sequence. In this paper, we extend
our previous work on this model for large vocabulary end-to-end
speech recognition. We first present a more effective stochastic gradient
decent (SGD) learning rate schedule that can significantly improve
the recognition accuracy. We then extend the decoder with
long memory by introducing another recurrent layer that performs
implicit language modelling. Finally, we demonstrate that using
multiple recurrent layers in the encoder can reduce the word error
rate. Our experiments were carried out on the Switchboard corpus
using a training set of around 300 hours of transcribed audio
data, and we have achieved significantly higher recognition accuracy,
thereby reduced the gap compared to the hybrid baseline.

up
0 users have voted:

Paper Details

Authors:
Liang Lu, Xingxing Zhang, Steve Renals
Submitted On:
18 March 2016 - 12:52pm
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
Liang Lu
Document Year:
2016
Cite

Document Files

liang_icassp16_slides.pdf

(408)

Subscribe

[1] Liang Lu, Xingxing Zhang, Steve Renals, "On Training the Recurrent Neural Network Encoder-Decoder for Large Vocabulary End-to-end Speech Recognition", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/769. Accessed: Aug. 10, 2020.
@article{769-16,
url = {http://sigport.org/769},
author = {Liang Lu; Xingxing Zhang; Steve Renals },
publisher = {IEEE SigPort},
title = {On Training the Recurrent Neural Network Encoder-Decoder for Large Vocabulary End-to-end Speech Recognition},
year = {2016} }
TY - EJOUR
T1 - On Training the Recurrent Neural Network Encoder-Decoder for Large Vocabulary End-to-end Speech Recognition
AU - Liang Lu; Xingxing Zhang; Steve Renals
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/769
ER -
Liang Lu, Xingxing Zhang, Steve Renals. (2016). On Training the Recurrent Neural Network Encoder-Decoder for Large Vocabulary End-to-end Speech Recognition. IEEE SigPort. http://sigport.org/769
Liang Lu, Xingxing Zhang, Steve Renals, 2016. On Training the Recurrent Neural Network Encoder-Decoder for Large Vocabulary End-to-end Speech Recognition. Available at: http://sigport.org/769.
Liang Lu, Xingxing Zhang, Steve Renals. (2016). "On Training the Recurrent Neural Network Encoder-Decoder for Large Vocabulary End-to-end Speech Recognition." Web.
1. Liang Lu, Xingxing Zhang, Steve Renals. On Training the Recurrent Neural Network Encoder-Decoder for Large Vocabulary End-to-end Speech Recognition [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/769