Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings

Abstract: 

In speaker-aware training, a speaker embedding is appended to DNN input features. This allows the DNN to effectively learn representations, which are robust to speaker variability.
We apply speaker-aware training to attention-based end- to-end speech recognition. We show that it can improve over a purely end-to-end baseline. We also propose speaker-aware training as a viable method to leverage untranscribed, speaker annotated data.
We apply state-of-the-art embedding approaches, both i-vectors and neural embeddings, such as x-vectors. We ex- periment with embeddings trained in two conditions: on the fixed ASR data, and on a large untranscribed dataset. We run our experiments on the TED-LIUM and Wall Street Journal datasets. No embedding consistently outperforms all others, but in many settings neural embeddings outperform i-vectors.

up
0 users have voted:

Paper Details

Authors:
Aku Rouhe, Tuomas Kaseva, Mikko Kurimo
Submitted On:
13 May 2020 - 4:49pm
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
Aku Rouhe
Paper Code:
3853
Document Year:
2020
Cite

Document Files

icassp2020-slides.pdf

(11)

Subscribe

[1] Aku Rouhe, Tuomas Kaseva, Mikko Kurimo, "Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5139. Accessed: Jun. 06, 2020.
@article{5139-20,
url = {http://sigport.org/5139},
author = {Aku Rouhe; Tuomas Kaseva; Mikko Kurimo },
publisher = {IEEE SigPort},
title = {Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings},
year = {2020} }
TY - EJOUR
T1 - Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings
AU - Aku Rouhe; Tuomas Kaseva; Mikko Kurimo
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5139
ER -
Aku Rouhe, Tuomas Kaseva, Mikko Kurimo. (2020). Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings. IEEE SigPort. http://sigport.org/5139
Aku Rouhe, Tuomas Kaseva, Mikko Kurimo, 2020. Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings. Available at: http://sigport.org/5139.
Aku Rouhe, Tuomas Kaseva, Mikko Kurimo. (2020). "Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings." Web.
1. Aku Rouhe, Tuomas Kaseva, Mikko Kurimo. Speaker-aware Training of Attention-based End-to-End Speech Recognition using Neural Speaker Embeddings [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5139