Sorry, you need to enable JavaScript to visit this website.

DNN BASED EMBEDDINGS FOR LANGUAGE RECOGNITION

Citation Author(s):
Alicia Lozano-Diez, Oldrich Plchot, Pavel Matejka, Joaquin Gonzalez-Rodriguez
Submitted by:
Alicia Lozano-Diez
Last updated:
12 April 2018 - 11:29am
Document Type:
Poster
Document Year:
2018
Event:
Presenters:
Alicia Lozano-Diez
Paper Code:
SP-P4.3
 

In this work, we present a language identification (LID) system based on embeddings. In our case, an embedding is a fixed-length vector (similar to i-vector) that represents the whole utterance, but unlike i-vector it is designed to contain mostly information relevant to the target task (LID). In order to obtain these embeddings, we train a deep neural network (DNN) with sequence summarization layer to classify languages. In particular, we trained a DNN based on bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) layers, whose frame-by-frame outputs are summarized into mean and standard deviation statistics. After this pooling layer, we add two fully connected layers whose outputs correspond to embeddings. Finally, we add a softmax output layer and train the whole network with multi-class cross-entropy objective to discriminate between languages. We report our results on NIST LRE 2015 and we compare the performance of embeddings and corresponding i-vectors both modeled by Gaussian Linear Classifier (GLC). Using only embeddings resulted in comparable performance to i-vectors and by performing score-level fusion we achieved 7.3% relative improvement over the baseline.

up
0 users have voted: