Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL

Abstract: 

Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the subword unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their
scripts. Specifically, we take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model jointly on data from all languages. We find that this model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous
sequence-to-sequence models trained on each language individually. By modifying the model to accept a language identifier as an additional input feature, we further improve performance by an additional 7% relative and eliminate confusion between different languages.

up
0 users have voted:

Paper Details

Authors:
Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao
Submitted On:
19 April 2018 - 4:43pm
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
Shubham Toshniwal
Paper Code:
2040
Document Year:
2018
Cite

Document Files

Multilingual end-to-end model

(24 downloads)

Subscribe

[1] Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao, "MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3024. Accessed: May. 21, 2018.
@article{3024-18,
url = {http://sigport.org/3024},
author = {Shubham Toshniwal; Tara N. Sainath; Ron J. Weiss; Bo Li; Pedro Moreno; Eugene Weinstein; Kanishka Rao },
publisher = {IEEE SigPort},
title = {MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL},
year = {2018} }
TY - EJOUR
T1 - MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL
AU - Shubham Toshniwal; Tara N. Sainath; Ron J. Weiss; Bo Li; Pedro Moreno; Eugene Weinstein; Kanishka Rao
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3024
ER -
Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao. (2018). MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL. IEEE SigPort. http://sigport.org/3024
Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao, 2018. MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL. Available at: http://sigport.org/3024.
Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao. (2018). "MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL." Web.
1. Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao. MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3024