Sorry, you need to enable JavaScript to visit this website.

MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL

Citation Author(s):
Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao
Submitted by:
Shubham Toshniwal
Last updated:
19 April 2018 - 4:43pm
Document Type:
Presentation Slides
Document Year:
2018
Event:
Presenters:
Shubham Toshniwal
Paper Code:
2040
 

Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the subword unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their
scripts. Specifically, we take a union of language-specific grapheme sets and train a grapheme-based sequence-to-sequence model jointly on data from all languages. We find that this model, which is not explicitly given any information about language identity, improves recognition performance by 21% relative compared to analogous
sequence-to-sequence models trained on each language individually. By modifying the model to accept a language identifier as an additional input feature, we further improve performance by an additional 7% relative and eliminate confusion between different languages.

up
0 users have voted: