Sorry, you need to enable JavaScript to visit this website.

A TIME-RESTRICTED SELF-ATTENTION LAYER FOR ASR

Citation Author(s):
Daniel Povey, Hossein Hadian, Pegah Ghahremani, Ke Li, Sanjeev Khudanpur
Submitted by:
Hossein Hadian
Last updated:
19 April 2018 - 1:23pm
Document Type:
Poster
Document Year:
2018
Event:
Paper Code:
#3560
 

Self-attention -- an attention mechanism where the input and output
sequence lengths are the same -- has
recently been successfully applied to machine translation, caption generation, and phoneme recognition.
In this paper we apply a restricted self-attention mechanism (with
multiple heads) to speech recognition. By ``restricted'' we
mean that the mechanism at a particular frame only sees input from a
limited number of frames to
the left and right. Restricting the context makes it easier to
encode the position of the input -- we use a 1-hot
encoding of the frame offset. We try introducing
attention layers into TDNN architectures, and replacing LSTM layers
with attention layers in TDNN+LSTM
architectures. We show experiments on a number of ASR
setups. We observe improvements compared to the TDNN and TDNN+LSTM baselines. Attention layers are also faster than LSTM layers in test time, since they lack
recurrence.

up
0 users have voted: