A TIME-RESTRICTED SELF-ATTENTION LAYER FOR ASR

Self-attention -- an attention mechanism where the input and output
sequence lengths are the same -- has
recently been successfully applied to machine translation, caption generation, and phoneme recognition.
In this paper we apply a restricted self-attention mechanism (with
multiple heads) to speech recognition. By ``restricted'' we
mean that the mechanism at a particular frame only sees input from a
limited number of frames to
the left and right. Restricting the context makes it easier to
encode the position of the input -- we use a 1-hot
encoding of the frame offset. We try introducing
attention layers into TDNN architectures, and replacing LSTM layers
with attention layers in TDNN+LSTM
architectures. We show experiments on a number of ASR
setups. We observe improvements compared to the TDNN and TDNN+LSTM baselines. Attention layers are also faster than LSTM layers in test time, since they lack
recurrence.

Poster - Self-attention.pdf

Poster - Self-attention.pdf (531)

Thumbs Up

CITE

Documents

Poster

A TIME-RESTRICTED SELF-ATTENTION LAYER FOR ASR

Poster - Self-attention.pdf

QUESTIONS?