Sorry, you need to enable JavaScript to visit this website.

Multilingual Recognition and Identification (SPE-MULT)

MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL


Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the subword unit, lexicon and word inventories are typically language specific. In contrast, sequence-to-sequence models are well suited for multilingual ASR because they encapsulate an acoustic, pronunciation and language model jointly in a single network. In this work we present a single sequence-to-sequence ASR model trained on 9 different Indian languages, which have very little overlap in their

Paper Details

Authors:
Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao
Submitted On:
19 April 2018 - 4:43pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Multilingual end-to-end model

(9 downloads)

Keywords

Subscribe

[1] Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao, "MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3024. Accessed: Apr. 27, 2018.
@article{3024-18,
url = {http://sigport.org/3024},
author = {Shubham Toshniwal; Tara N. Sainath; Ron J. Weiss; Bo Li; Pedro Moreno; Eugene Weinstein; Kanishka Rao },
publisher = {IEEE SigPort},
title = {MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL},
year = {2018} }
TY - EJOUR
T1 - MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL
AU - Shubham Toshniwal; Tara N. Sainath; Ron J. Weiss; Bo Li; Pedro Moreno; Eugene Weinstein; Kanishka Rao
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3024
ER -
Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao. (2018). MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL. IEEE SigPort. http://sigport.org/3024
Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao, 2018. MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL. Available at: http://sigport.org/3024.
Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao. (2018). "MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL." Web.
1. Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao. MULTILINGUAL SPEECH RECOGNITION WITH A SINGLE END-TO-END MODEL [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3024

SEQUENCE-BASED MULTI-LINGUAL LOW RESOURCE SPEECH RECOGNITION


Techniques for multi-lingual and cross-lingual speech recognition can help in low resource scenarios, to bootstrap systems and enable analysis of new languages and domains. End-to-end approaches, in particular sequence-based techniques, are attractive because of their simplicity and elegance. While it is possible to integrate traditional multi-lingual bottleneck feature extractors as front-ends, we show that end-to-end multi-lingual training of sequence models is effective on context independent models trained using Connectionist Temporal Classification (CTC) loss.

Paper Details

Authors:
Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W Black
Submitted On:
18 April 2018 - 3:03pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Dalmia_ICASSP_2018.pdf

(7 downloads)

Keywords

Subscribe

[1] Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W Black, "SEQUENCE-BASED MULTI-LINGUAL LOW RESOURCE SPEECH RECOGNITION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2970. Accessed: Apr. 27, 2018.
@article{2970-18,
url = {http://sigport.org/2970},
author = {Siddharth Dalmia; Ramon Sanabria; Florian Metze; Alan W Black },
publisher = {IEEE SigPort},
title = {SEQUENCE-BASED MULTI-LINGUAL LOW RESOURCE SPEECH RECOGNITION},
year = {2018} }
TY - EJOUR
T1 - SEQUENCE-BASED MULTI-LINGUAL LOW RESOURCE SPEECH RECOGNITION
AU - Siddharth Dalmia; Ramon Sanabria; Florian Metze; Alan W Black
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2970
ER -
Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W Black. (2018). SEQUENCE-BASED MULTI-LINGUAL LOW RESOURCE SPEECH RECOGNITION. IEEE SigPort. http://sigport.org/2970
Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W Black, 2018. SEQUENCE-BASED MULTI-LINGUAL LOW RESOURCE SPEECH RECOGNITION. Available at: http://sigport.org/2970.
Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W Black. (2018). "SEQUENCE-BASED MULTI-LINGUAL LOW RESOURCE SPEECH RECOGNITION." Web.
1. Siddharth Dalmia, Ramon Sanabria, Florian Metze, Alan W Black. SEQUENCE-BASED MULTI-LINGUAL LOW RESOURCE SPEECH RECOGNITION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2970

Towards language-universal end-to-end speech recognition

Paper Details

Authors:
Submitted On:
18 April 2018 - 5:21pm
Short Link:
Type:

Document Files

2018_icassp_presentation_4.pdf

(9 downloads)

2018_icassp_presentation_4.pdf

(5 downloads)

Keywords

Subscribe

[1] , "Towards language-universal end-to-end speech recognition", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2969. Accessed: Apr. 27, 2018.
@article{2969-18,
url = {http://sigport.org/2969},
author = { },
publisher = {IEEE SigPort},
title = {Towards language-universal end-to-end speech recognition},
year = {2018} }
TY - EJOUR
T1 - Towards language-universal end-to-end speech recognition
AU -
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2969
ER -
. (2018). Towards language-universal end-to-end speech recognition. IEEE SigPort. http://sigport.org/2969
, 2018. Towards language-universal end-to-end speech recognition. Available at: http://sigport.org/2969.
. (2018). "Towards language-universal end-to-end speech recognition." Web.
1. . Towards language-universal end-to-end speech recognition [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2969

A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification


A novel learnable dictionary encoding layer is proposed in this paper for end-to-end language identification. It is inline with the conventional GMM i-vector approach both theoretically and practically. We imitate the mechanism of traditional GMM training and Supervector encoding procedure on the top of CNN. The proposed layer can accumulate high-order statistics from variable-length input sequence and generate an utterance level fixed-dimensional vector representation.

Paper Details

Authors:
Weicheng Cai, Zexin Cai, Xiang Zhang, Xiaoqi Wang, Ming Li
Submitted On:
13 April 2018 - 9:37am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster_weichcai_icassp2018_lde.pdf

(8 downloads)

Keywords

Subscribe

[1] Weicheng Cai, Zexin Cai, Xiang Zhang, Xiaoqi Wang, Ming Li, "A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2701. Accessed: Apr. 27, 2018.
@article{2701-18,
url = {http://sigport.org/2701},
author = {Weicheng Cai; Zexin Cai; Xiang Zhang; Xiaoqi Wang; Ming Li },
publisher = {IEEE SigPort},
title = {A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification},
year = {2018} }
TY - EJOUR
T1 - A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification
AU - Weicheng Cai; Zexin Cai; Xiang Zhang; Xiaoqi Wang; Ming Li
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2701
ER -
Weicheng Cai, Zexin Cai, Xiang Zhang, Xiaoqi Wang, Ming Li. (2018). A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification. IEEE SigPort. http://sigport.org/2701
Weicheng Cai, Zexin Cai, Xiang Zhang, Xiaoqi Wang, Ming Li, 2018. A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification. Available at: http://sigport.org/2701.
Weicheng Cai, Zexin Cai, Xiang Zhang, Xiaoqi Wang, Ming Li. (2018). "A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification." Web.
1. Weicheng Cai, Zexin Cai, Xiang Zhang, Xiaoqi Wang, Ming Li. A Novel Learnable Dictionary Encoding Layer for End-to-End Language Identification [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2701

Insights into End-to-End Learning Scheme for Language Identification


A novel interpretable end-to-end learning scheme for language identification is proposed. It is in line with the classical GMM i-vector methods both theoretically and practically. In the end-to-end pipeline, a general encoding layer is employed on top of the front-end CNN, so that it can encode the variable-length input sequence into an utterance level vector automatically. After comparing with the state-of-the-art GMM i-vector methods, we give insights into CNN, and reveal its role and effect in the whole pipeline.

Paper Details

Authors:
Weicheng Cai, Zexin Cai, Wenbo Liu, Xiaoqi Wang, Ming Li
Submitted On:
13 April 2018 - 9:32am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster_weichcai_icassp2018_e2e.pdf

(8 downloads)

Keywords

Subscribe

[1] Weicheng Cai, Zexin Cai, Wenbo Liu, Xiaoqi Wang, Ming Li, "Insights into End-to-End Learning Scheme for Language Identification", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2699. Accessed: Apr. 27, 2018.
@article{2699-18,
url = {http://sigport.org/2699},
author = {Weicheng Cai; Zexin Cai; Wenbo Liu; Xiaoqi Wang; Ming Li },
publisher = {IEEE SigPort},
title = {Insights into End-to-End Learning Scheme for Language Identification},
year = {2018} }
TY - EJOUR
T1 - Insights into End-to-End Learning Scheme for Language Identification
AU - Weicheng Cai; Zexin Cai; Wenbo Liu; Xiaoqi Wang; Ming Li
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2699
ER -
Weicheng Cai, Zexin Cai, Wenbo Liu, Xiaoqi Wang, Ming Li. (2018). Insights into End-to-End Learning Scheme for Language Identification. IEEE SigPort. http://sigport.org/2699
Weicheng Cai, Zexin Cai, Wenbo Liu, Xiaoqi Wang, Ming Li, 2018. Insights into End-to-End Learning Scheme for Language Identification. Available at: http://sigport.org/2699.
Weicheng Cai, Zexin Cai, Wenbo Liu, Xiaoqi Wang, Ming Li. (2018). "Insights into End-to-End Learning Scheme for Language Identification." Web.
1. Weicheng Cai, Zexin Cai, Wenbo Liu, Xiaoqi Wang, Ming Li. Insights into End-to-End Learning Scheme for Language Identification [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2699

DNN BASED EMBEDDINGS FOR LANGUAGE RECOGNITION


In this work, we present a language identification (LID) system based on embeddings. In our case, an embedding is a fixed-length vector (similar to i-vector) that represents the whole utterance, but unlike i-vector it is designed to contain mostly information relevant to the target task (LID). In order to obtain these embeddings, we train a deep neural network (DNN) with sequence summarization layer to classify languages.

Paper Details

Authors:
Alicia Lozano-Diez, Oldrich Plchot, Pavel Matejka, Joaquin Gonzalez-Rodriguez
Submitted On:
12 April 2018 - 11:29am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Poster Embeddings LID NIST LRE 2017 Lozano et al.

(6 downloads)

Keywords

Additional Categories

Subscribe

[1] Alicia Lozano-Diez, Oldrich Plchot, Pavel Matejka, Joaquin Gonzalez-Rodriguez, "DNN BASED EMBEDDINGS FOR LANGUAGE RECOGNITION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2399. Accessed: Apr. 27, 2018.
@article{2399-18,
url = {http://sigport.org/2399},
author = {Alicia Lozano-Diez; Oldrich Plchot; Pavel Matejka; Joaquin Gonzalez-Rodriguez },
publisher = {IEEE SigPort},
title = {DNN BASED EMBEDDINGS FOR LANGUAGE RECOGNITION},
year = {2018} }
TY - EJOUR
T1 - DNN BASED EMBEDDINGS FOR LANGUAGE RECOGNITION
AU - Alicia Lozano-Diez; Oldrich Plchot; Pavel Matejka; Joaquin Gonzalez-Rodriguez
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2399
ER -
Alicia Lozano-Diez, Oldrich Plchot, Pavel Matejka, Joaquin Gonzalez-Rodriguez. (2018). DNN BASED EMBEDDINGS FOR LANGUAGE RECOGNITION. IEEE SigPort. http://sigport.org/2399
Alicia Lozano-Diez, Oldrich Plchot, Pavel Matejka, Joaquin Gonzalez-Rodriguez, 2018. DNN BASED EMBEDDINGS FOR LANGUAGE RECOGNITION. Available at: http://sigport.org/2399.
Alicia Lozano-Diez, Oldrich Plchot, Pavel Matejka, Joaquin Gonzalez-Rodriguez. (2018). "DNN BASED EMBEDDINGS FOR LANGUAGE RECOGNITION." Web.
1. Alicia Lozano-Diez, Oldrich Plchot, Pavel Matejka, Joaquin Gonzalez-Rodriguez. DNN BASED EMBEDDINGS FOR LANGUAGE RECOGNITION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2399