The paper provides an analysis of automatic speech recognition
systems (ASR) based on multilingual BLSTM, where we used multi-task
training with separate classification layer for each language. The
focus is on low resource languages, where only a limited
amount of transcribed speech is available. In such
scenario, we found it
essential to train the ASR systems in a multilingual fashion and we
report superior results
obtained with pre-trained multilingual BLSTM on this task.
The high resource languages are also
More and more linguistic information has been employed to improve the performance of machine translation, such as part of speech, syntactic structures, discourse contexts, and so on. However, conventional approaches typically ignore the key information beyond the text such as prosody. In this paper, we exploit and employ three prosodic features: pronunciation (phonetic alphabet and tone), prosodic boundaries and emphasis.