Sorry, you need to enable JavaScript to visit this website.

Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings

Citation Author(s):
Aku Rouhe, Tuomas Kaseva, Mikko Kurimo
Submitted by:
Aku Rouhe
Last updated:
13 May 2020 - 4:49pm
Document Type:
Presentation Slides
Document Year:
Presenters Name:
Aku Rouhe
Paper Code:



In speaker-aware training, a speaker embedding is appended to DNN input features. This allows the DNN to effectively learn representations, which are robust to speaker variability.
We apply speaker-aware training to attention-based end- to-end speech recognition. We show that it can improve over a purely end-to-end baseline. We also propose speaker-aware training as a viable method to leverage untranscribed, speaker annotated data.
We apply state-of-the-art embedding approaches, both i-vectors and neural embeddings, such as x-vectors. We ex- periment with embeddings trained in two conditions: on the fixed ASR data, and on a large untranscribed dataset. We run our experiments on the TED-LIUM and Wall Street Journal datasets. No embedding consistently outperforms all others, but in many settings neural embeddings outperform i-vectors.

0 users have voted:

Dataset Files