Sorry, you need to enable JavaScript to visit this website.

On Training the Recurrent Neural Network Encoder-Decoder for Large Vocabulary End-to-end Speech Recognition

Citation Author(s):
Liang Lu, Xingxing Zhang, Steve Renals
Submitted by:
Liang Lu
Last updated:
18 March 2016 - 12:52pm
Document Type:
Presentation Slides
Document Year:
2016
Event:
Presenters:
Liang Lu
 

Recently, there has been an increasing interest in end-to-end speech
recognition using neural networks, with no reliance on hidden
Markov models (HMMs) for sequence modelling as in the standard
hybrid framework. The recurrent neural network (RNN) encoder-decoder
is such a model, performing sequence to sequence mapping
without any predefined alignment. This model first transforms the
input sequence into a fixed length vector representation, from which
the decoder recovers the output sequence. In this paper, we extend
our previous work on this model for large vocabulary end-to-end
speech recognition. We first present a more effective stochastic gradient
decent (SGD) learning rate schedule that can significantly improve
the recognition accuracy. We then extend the decoder with
long memory by introducing another recurrent layer that performs
implicit language modelling. Finally, we demonstrate that using
multiple recurrent layers in the encoder can reduce the word error
rate. Our experiments were carried out on the Switchboard corpus
using a training set of around 300 hours of transcribed audio
data, and we have achieved significantly higher recognition accuracy,
thereby reduced the gap compared to the hybrid baseline.

up
0 users have voted: