Multilingual Recognition and Identification (SPE-MULT)

END-TO-END LANGUAGE RECOGNITION USING ATTENTION BASED HIERARCHICAL GATED RECURRENT UNIT MODELS

The task of automatic language identification (LID) involving multiple dialects of the same language family on short speech recordings is a challenging problem. This can be further complicated for short-duration audio snippets in the presence of noise sources. In these scenarios, the identity of the language/dialect may be reliably present only in parts of the speech embedded in the temporal sequence.

ICASSP19_3253_poster.pdf

ICASSP19_3253_poster.pdf (375)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

36 Views

EXPLORING RETRAINING-FREE SPEECH RECOGNITION FOR INTRA-SENTENTIAL CODE-SWITCHING

Read more about EXPLORING RETRAINING-FREE SPEECH RECOGNITION FOR INTRA-SENTENTIAL CODE-SWITCHING
Log in to post comments

Code Switching refers to the phenomenon of changing languages within a sentence or discourse, and it represents a challenge for conventional automatic speech recognition systems deployed to tackle a single target language. The code switching problem is complicated by the lack of multi-lingual training data needed to build new and ad hoc multi-lingual acoustic and language models. In this work, we present a prototype research code-switching speech recognition system that leverages existing monolingual acoustic and language models, i.e., no ad hoc training is needed.

CS_final-3 copy.pdf

CS_final-3 copy.pdf (404)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

91 Views

Tuplemax Loss for Language Identification

Read more about Tuplemax Loss for Language Identification
Log in to post comments

In many scenarios of a language identification task, the user will specify a small set of languages which he/she can speak instead of a large set of all possible languages. We want to model such prior knowledge into the way we train our neural networks, by replacing the commonly used softmax loss function with a novel loss function named tuplemax loss. As a matter of fact, a typical language identification system launched in North America has about 95% users who could speak no more than two languages.

tuplemax_icassp2019_poster.pdf

poster (465)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

41 Views

MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL

Read more about MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL
Log in to post comments

Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the subword unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their

multilingual_asr_toshniwal.pdf

Multilingual end-to-end model (544)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

14 Views

SEQUENCE-BASED MULTI-LINGUAL LOW RESOURCE SPEECH RECOGNITION

Read more about SEQUENCE-BASED MULTI-LINGUAL LOW RESOURCE SPEECH RECOGNITION
Log in to post comments

Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains. End-to-end approaches, in particular sequence-based techniques, are attractive because of their simplicity and elegance. While it is possible to integrate traditional multi-lingual bottleneck feature extractors as front-ends, we show that end-to-end multi-lingual training of sequence models is effective on context independent models trained using Connectionist Temporal Classification (CTC) loss.

Dalmia_ICASSP_2018.pdf

Dalmia_ICASSP_2018.pdf (442)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

15 Views

Towards language-universal end-to-end speech recognition

Read more about Towards language-universal end-to-end speech recognition
Log in to post comments

2018_icassp_presentation_4.pdf

2018_icassp_presentation_4.pdf (443)

2018_icassp_presentation_4.pdf

2018_icassp_presentation_4.pdf (506)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

16 Views

A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification

Read more about A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification
Log in to post comments

A novel learnable dictionary encoding layer is proposed in this paper for end-to-end language identification. It is inline with the conventional GMM i-vector approach both theoretically and practically. We imitate the mechanism of traditional GMM training and Supervector encoding procedure on the top of CNN. The proposed layer can accumulate high-order statistics from variable-length input sequence and generate an utterance level fixed-dimensional vector representation.

poster_weichcai_icassp2018_lde.pdf

poster_weichcai_icassp2018_lde.pdf (527)

Categories:: Multilingual Recognition and Identification (SPE-MULT)
Speaker Recognition and Characterization (SPE-SPKR)

25 Views

Insights into End-to-End Learning Scheme for Language Identification

Read more about Insights into End-to-End Learning Scheme for Language Identification
Log in to post comments

A novel interpretable end-to-end learning scheme for language identification is proposed. It is in line with the classical GMM i-vector methods both theoretically and practically. In the end-to-end pipeline, a general encoding layer is employed on top of the front-end CNN, so that it can encode the variable-length input sequence into an utterance level vector automatically. After comparing with the state-of-the-art GMM i-vector methods, we give insights into CNN, and reveal its role and effect in the whole pipeline.

poster_weichcai_icassp2018_e2e.pdf

poster_weichcai_icassp2018_e2e.pdf (481)

Categories:: Multilingual Recognition and Identification (SPE-MULT)
Speaker Recognition and Characterization (SPE-SPKR)

22 Views

DNN BASED EMBEDDINGS FOR LANGUAGE RECOGNITION

Read more about DNN BASED EMBEDDINGS FOR LANGUAGE RECOGNITION
Log in to post comments

In this work, we present a language identification (LID) system based on embeddings. In our case, an embedding is a fixed-length vector (similar to i-vector) that represents the whole utterance, but unlike i-vector it is designed to contain mostly information relevant to the target task (LID). In order to obtain these embeddings, we train a deep neural network (DNN) with sequence summarization layer to classify languages.

Poster_EmbeddingsLID_Alicia_v2.pdf

Poster Embeddings LID NIST LRE 2017 Lozano et al. (539)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

21 Views

Multilingual Recognition and Identification (SPE-MULT)

Pages