Sorry, you need to enable JavaScript to visit this website.

Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

Can DNNs Learn to Lipread Full Sentences ?


Finding visual features and suitable models for lipreading tasks that are more complex than a well-constrained vocabulary has proven challenging. This paper explores state-of-the-art Deep Neural Network architectures for lipreading based on a Sequence to Sequence Recurrent Neural Network. We report results for both hand-crafted and 2D/3D Convolutional Neural Network visual front-ends, online monotonic attention, and a joint Connectionist Temporal Classification-Sequence-to-Sequence loss.

slides.pdf

PDF icon slides.pdf (8 downloads)

Paper Details

Authors:
George Sterpu, Christian Saam, Naomi Harte
Submitted On:
8 October 2018 - 1:50am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

slides.pdf

(8 downloads)

Subscribe

[1] George Sterpu, Christian Saam, Naomi Harte, "Can DNNs Learn to Lipread Full Sentences ?", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3608. Accessed: Nov. 12, 2018.
@article{3608-18,
url = {http://sigport.org/3608},
author = {George Sterpu; Christian Saam; Naomi Harte },
publisher = {IEEE SigPort},
title = {Can DNNs Learn to Lipread Full Sentences ?},
year = {2018} }
TY - EJOUR
T1 - Can DNNs Learn to Lipread Full Sentences ?
AU - George Sterpu; Christian Saam; Naomi Harte
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3608
ER -
George Sterpu, Christian Saam, Naomi Harte. (2018). Can DNNs Learn to Lipread Full Sentences ?. IEEE SigPort. http://sigport.org/3608
George Sterpu, Christian Saam, Naomi Harte, 2018. Can DNNs Learn to Lipread Full Sentences ?. Available at: http://sigport.org/3608.
George Sterpu, Christian Saam, Naomi Harte. (2018). "Can DNNs Learn to Lipread Full Sentences ?." Web.
1. George Sterpu, Christian Saam, Naomi Harte. Can DNNs Learn to Lipread Full Sentences ? [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3608

ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM


This paper presents methods to accelerate recurrent neural network based language models (RNNLMs) for online speech recognition systems.
Firstly, a lossy compression of the past hidden layer outputs (history vector) with caching is introduced in order to reduce the number of LM queries.
Next, RNNLM computations are deployed in a CPU-GPU hybrid manner, which computes each layer of the model on a more advantageous platform.
The added overhead by data exchanges between CPU and GPU is compensated through a frame-wise batching strategy.

Paper Details

Authors:
Chiyoun Park, Namhoon Kim, Jaewon Lee
Submitted On:
26 April 2018 - 1:10am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Icassp2018_KML_20180402_poster.pdf

(100 downloads)

Subscribe

[1] Chiyoun Park, Namhoon Kim, Jaewon Lee, "ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3183. Accessed: Nov. 12, 2018.
@article{3183-18,
url = {http://sigport.org/3183},
author = {Chiyoun Park; Namhoon Kim; Jaewon Lee },
publisher = {IEEE SigPort},
title = {ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM},
year = {2018} }
TY - EJOUR
T1 - ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM
AU - Chiyoun Park; Namhoon Kim; Jaewon Lee
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3183
ER -
Chiyoun Park, Namhoon Kim, Jaewon Lee. (2018). ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM. IEEE SigPort. http://sigport.org/3183
Chiyoun Park, Namhoon Kim, Jaewon Lee, 2018. ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM. Available at: http://sigport.org/3183.
Chiyoun Park, Namhoon Kim, Jaewon Lee. (2018). "ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM." Web.
1. Chiyoun Park, Namhoon Kim, Jaewon Lee. ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3183

OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018


The paper presents a new approach to extracting useful information from out-of-vocabulary (OOV) speech regions in ASR system output. The system makes use of a hybrid decoding network with both words and sub-word units. In the decoded lattices, candidates for OOV regions are identified

Paper Details

Authors:
Ekaterina Egorova, Ekaterina Egorova
Submitted On:
24 April 2018 - 10:23am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Egorova_poster (1).pdf

(131 downloads)

Subscribe

[1] Ekaterina Egorova, Ekaterina Egorova , "OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3168. Accessed: Nov. 12, 2018.
@article{3168-18,
url = {http://sigport.org/3168},
author = {Ekaterina Egorova; Ekaterina Egorova },
publisher = {IEEE SigPort},
title = {OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018},
year = {2018} }
TY - EJOUR
T1 - OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018
AU - Ekaterina Egorova; Ekaterina Egorova
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3168
ER -
Ekaterina Egorova, Ekaterina Egorova . (2018). OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018. IEEE SigPort. http://sigport.org/3168
Ekaterina Egorova, Ekaterina Egorova , 2018. OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018. Available at: http://sigport.org/3168.
Ekaterina Egorova, Ekaterina Egorova . (2018). "OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018." Web.
1. Ekaterina Egorova, Ekaterina Egorova . OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018 [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3168

A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition


Lattice-rescoring is a common approach to take advantage of recurrent neural language models in ASR, where a wordlattice is generated from 1st-pass decoding and the lattice is then rescored with a neural model, and an n-gram approximation method is usually adopted to limit the search space. In this work, we describe a pruned lattice-rescoring algorithm for ASR, improving the n-gram approximation method. The pruned algorithm further limits the search space and uses heuristic search to pick better histories when expanding the lattice.

Paper Details

Authors:
Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey, Sanjeev Khudanpur
Submitted On:
19 April 2018 - 11:47pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

lattice-rescoring-poster.pdf

(88 downloads)

Subscribe

[1] Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey, Sanjeev Khudanpur, "A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3063. Accessed: Nov. 12, 2018.
@article{3063-18,
url = {http://sigport.org/3063},
author = {Hainan Xu; Tongfei Chen; Dongji Gao; Yiming Wang; Ke Li; Nagendra Goel; Yishay Carmiel; Daniel Povey; Sanjeev Khudanpur },
publisher = {IEEE SigPort},
title = {A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition},
year = {2018} }
TY - EJOUR
T1 - A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition
AU - Hainan Xu; Tongfei Chen; Dongji Gao; Yiming Wang; Ke Li; Nagendra Goel; Yishay Carmiel; Daniel Povey; Sanjeev Khudanpur
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3063
ER -
Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey, Sanjeev Khudanpur. (2018). A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition. IEEE SigPort. http://sigport.org/3063
Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey, Sanjeev Khudanpur, 2018. A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition. Available at: http://sigport.org/3063.
Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey, Sanjeev Khudanpur. (2018). "A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition." Web.
1. Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey, Sanjeev Khudanpur. A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3063

SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER

Paper Details

Authors:
HIROSHI FUJIMURA, MANABU NAGAO, TAKASHI MASUKO
Submitted On:
16 April 2018 - 2:30am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2018_simultaneous__poster_final.pdf

(134 downloads)

Subscribe

[1] HIROSHI FUJIMURA, MANABU NAGAO, TAKASHI MASUKO, "SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2908. Accessed: Nov. 12, 2018.
@article{2908-18,
url = {http://sigport.org/2908},
author = {HIROSHI FUJIMURA; MANABU NAGAO; TAKASHI MASUKO },
publisher = {IEEE SigPort},
title = {SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER},
year = {2018} }
TY - EJOUR
T1 - SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER
AU - HIROSHI FUJIMURA; MANABU NAGAO; TAKASHI MASUKO
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2908
ER -
HIROSHI FUJIMURA, MANABU NAGAO, TAKASHI MASUKO. (2018). SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER. IEEE SigPort. http://sigport.org/2908
HIROSHI FUJIMURA, MANABU NAGAO, TAKASHI MASUKO, 2018. SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER. Available at: http://sigport.org/2908.
HIROSHI FUJIMURA, MANABU NAGAO, TAKASHI MASUKO. (2018). "SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER." Web.
1. HIROSHI FUJIMURA, MANABU NAGAO, TAKASHI MASUKO. SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2908

A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster)


Connectionist temporal classification (CTC) is a popular sequence prediction approach for automatic speech recognition that is typically used with models based on recurrent neural networks (RNNs). We explore whether deep convolutional neural networks (CNNs) can be used effectively instead of RNNs as the "encoder" in CTC. CNNs lack an explicit representation of the entire sequence, but have the advantage that they are much faster to train. We present an exploration of CNNs as encoders for CTC models, in the context of character-based (lexicon-free) automatic speech recognition.

Paper Details

Authors:
Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu
Submitted On:
14 April 2018 - 6:13am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

study-convolutional-encoders.pdf

(78 downloads)

Subscribe

[1] Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu, "A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster)", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2816. Accessed: Nov. 12, 2018.
@article{2816-18,
url = {http://sigport.org/2816},
author = {Kalpesh Krishna; Liang Lu; Kevin Gimpel; Karen Livescu },
publisher = {IEEE SigPort},
title = {A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster)},
year = {2018} }
TY - EJOUR
T1 - A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster)
AU - Kalpesh Krishna; Liang Lu; Kevin Gimpel; Karen Livescu
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2816
ER -
Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu. (2018). A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster). IEEE SigPort. http://sigport.org/2816
Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu, 2018. A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster). Available at: http://sigport.org/2816.
Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu. (2018). "A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster)." Web.
1. Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu. A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster) [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2816

MODELING NON-LINGUISTIC CONTEXTUAL SIGNALS IN LSTM LANGUAGE MODELS VIA DOMAIN ADAPTATION


When it comes to speech recognition for voice search, it would be
advantageous to take into account application information associated
with speech queries. However, in practice, the vast majority
of queries typically lack such annotations, posing a challenge to
train domain-specific language models (LMs). To obtain robust domain
LMs, typically a LM which has been pre-trained on general
data will be adapted to specific domains. We propose four adaptation
schemes to improve the domain performance of long shortterm

domain.pdf

PDF icon domain.pdf (91 downloads)

Paper Details

Authors:
Min Ma, Shankar Kumar, Fadi Biadsy, Michael Nirschl, Tomas Vykruta, Pedro J. Moreno
Submitted On:
13 April 2018 - 1:07pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

domain.pdf

(91 downloads)

Subscribe

[1] Min Ma, Shankar Kumar, Fadi Biadsy, Michael Nirschl, Tomas Vykruta, Pedro J. Moreno, "MODELING NON-LINGUISTIC CONTEXTUAL SIGNALS IN LSTM LANGUAGE MODELS VIA DOMAIN ADAPTATION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2731. Accessed: Nov. 12, 2018.
@article{2731-18,
url = {http://sigport.org/2731},
author = {Min Ma; Shankar Kumar; Fadi Biadsy; Michael Nirschl; Tomas Vykruta; Pedro J. Moreno },
publisher = {IEEE SigPort},
title = {MODELING NON-LINGUISTIC CONTEXTUAL SIGNALS IN LSTM LANGUAGE MODELS VIA DOMAIN ADAPTATION},
year = {2018} }
TY - EJOUR
T1 - MODELING NON-LINGUISTIC CONTEXTUAL SIGNALS IN LSTM LANGUAGE MODELS VIA DOMAIN ADAPTATION
AU - Min Ma; Shankar Kumar; Fadi Biadsy; Michael Nirschl; Tomas Vykruta; Pedro J. Moreno
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2731
ER -
Min Ma, Shankar Kumar, Fadi Biadsy, Michael Nirschl, Tomas Vykruta, Pedro J. Moreno. (2018). MODELING NON-LINGUISTIC CONTEXTUAL SIGNALS IN LSTM LANGUAGE MODELS VIA DOMAIN ADAPTATION. IEEE SigPort. http://sigport.org/2731
Min Ma, Shankar Kumar, Fadi Biadsy, Michael Nirschl, Tomas Vykruta, Pedro J. Moreno, 2018. MODELING NON-LINGUISTIC CONTEXTUAL SIGNALS IN LSTM LANGUAGE MODELS VIA DOMAIN ADAPTATION. Available at: http://sigport.org/2731.
Min Ma, Shankar Kumar, Fadi Biadsy, Michael Nirschl, Tomas Vykruta, Pedro J. Moreno. (2018). "MODELING NON-LINGUISTIC CONTEXTUAL SIGNALS IN LSTM LANGUAGE MODELS VIA DOMAIN ADAPTATION." Web.
1. Min Ma, Shankar Kumar, Fadi Biadsy, Michael Nirschl, Tomas Vykruta, Pedro J. Moreno. MODELING NON-LINGUISTIC CONTEXTUAL SIGNALS IN LSTM LANGUAGE MODELS VIA DOMAIN ADAPTATION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2731

On the use of grapheme models for searching in large spoken archives


This paper explores the possibility to use grapheme-based word and sub-word models in the task of spoken term detection (STD). The usage of grapheme models eliminates the need for expert-prepared pronunciation lexicons (which are often far from complete) and/or trainable grapheme-to-phoneme (G2P) algorithms that are frequently rather inaccurate, especially for rare words (words coming from a~different language). Moreover, the G2P conversion of the search terms that need to be performed on-line can substantially increase the response time of the STD system.

poster.pdf

PDF icon poster.pdf (137 downloads)

Paper Details

Authors:
Jan Švec, Josef V. Psutka, Jan Trmal, Luboš Šmídl, Pavel Ircing, Jan Sedmidubsky
Submitted On:
13 April 2018 - 2:55am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster.pdf

(137 downloads)

Subscribe

[1] Jan Švec, Josef V. Psutka, Jan Trmal, Luboš Šmídl, Pavel Ircing, Jan Sedmidubsky, "On the use of grapheme models for searching in large spoken archives", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2622. Accessed: Nov. 12, 2018.
@article{2622-18,
url = {http://sigport.org/2622},
author = {Jan Švec; Josef V. Psutka; Jan Trmal; Luboš Šmídl; Pavel Ircing; Jan Sedmidubsky },
publisher = {IEEE SigPort},
title = {On the use of grapheme models for searching in large spoken archives},
year = {2018} }
TY - EJOUR
T1 - On the use of grapheme models for searching in large spoken archives
AU - Jan Švec; Josef V. Psutka; Jan Trmal; Luboš Šmídl; Pavel Ircing; Jan Sedmidubsky
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2622
ER -
Jan Švec, Josef V. Psutka, Jan Trmal, Luboš Šmídl, Pavel Ircing, Jan Sedmidubsky. (2018). On the use of grapheme models for searching in large spoken archives. IEEE SigPort. http://sigport.org/2622
Jan Švec, Josef V. Psutka, Jan Trmal, Luboš Šmídl, Pavel Ircing, Jan Sedmidubsky, 2018. On the use of grapheme models for searching in large spoken archives. Available at: http://sigport.org/2622.
Jan Švec, Josef V. Psutka, Jan Trmal, Luboš Šmídl, Pavel Ircing, Jan Sedmidubsky. (2018). "On the use of grapheme models for searching in large spoken archives." Web.
1. Jan Švec, Josef V. Psutka, Jan Trmal, Luboš Šmídl, Pavel Ircing, Jan Sedmidubsky. On the use of grapheme models for searching in large spoken archives [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2622

Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition


This paper establishs CTC-based systems on Chinese Mandarin ASR task, three different level output units are explored: characters, context independent phonemes and context dependent phoneme. To make training stable we propose Newbob-Trn strategy, furthermore, blank label prior cost is proposed to improve the performance. Further, we establish the CTC-trained UniLSTM-RC model, which ensures the real-time requirement of an online system, meanwhile, brings performance gain on Chinese Mandarin ASR task.

Paper Details

Authors:
Pengrui Wang,Jie Li,Bo Xu
Submitted On:
17 October 2016 - 11:07am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition.pptx

(244 downloads)

Subscribe

[1] Pengrui Wang,Jie Li,Bo Xu, "Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1231. Accessed: Nov. 12, 2018.
@article{1231-16,
url = {http://sigport.org/1231},
author = {Pengrui Wang;Jie Li;Bo Xu },
publisher = {IEEE SigPort},
title = {Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition},
year = {2016} }
TY - EJOUR
T1 - Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition
AU - Pengrui Wang;Jie Li;Bo Xu
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1231
ER -
Pengrui Wang,Jie Li,Bo Xu. (2016). Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition. IEEE SigPort. http://sigport.org/1231
Pengrui Wang,Jie Li,Bo Xu, 2016. Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition. Available at: http://sigport.org/1231.
Pengrui Wang,Jie Li,Bo Xu. (2016). "Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition." Web.
1. Pengrui Wang,Jie Li,Bo Xu. Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1231

End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin


Traditional hybrid DNN-HMM based ASR system for keywords spotting which models HMM states are not flexible to optimize for a specific language. In this paper, we construct an end-to-end acoustic model based ASR for keywords spotting in Mandarin. This model is constructed by LSTM-RNN and trained with objective measure of connectionist temporal classification. The input of the network is feature sequences, and the output the probabilities of the initials and finals of Mandarin syllables.

Paper Details

Authors:
Ye Bai, Jiangyan Yi, Hao Ni, Zhengqi Wen, Bin Liu, Ya Li, Jianhua Tao
Submitted On:
14 October 2016 - 4:44am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

Ye Bai Poster.pdf

(299 downloads)

Subscribe

[1] Ye Bai, Jiangyan Yi, Hao Ni, Zhengqi Wen, Bin Liu, Ya Li, Jianhua Tao, "End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1184. Accessed: Nov. 12, 2018.
@article{1184-16,
url = {http://sigport.org/1184},
author = {Ye Bai; Jiangyan Yi; Hao Ni; Zhengqi Wen; Bin Liu; Ya Li; Jianhua Tao },
publisher = {IEEE SigPort},
title = {End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin},
year = {2016} }
TY - EJOUR
T1 - End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin
AU - Ye Bai; Jiangyan Yi; Hao Ni; Zhengqi Wen; Bin Liu; Ya Li; Jianhua Tao
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1184
ER -
Ye Bai, Jiangyan Yi, Hao Ni, Zhengqi Wen, Bin Liu, Ya Li, Jianhua Tao. (2016). End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin. IEEE SigPort. http://sigport.org/1184
Ye Bai, Jiangyan Yi, Hao Ni, Zhengqi Wen, Bin Liu, Ya Li, Jianhua Tao, 2016. End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin. Available at: http://sigport.org/1184.
Ye Bai, Jiangyan Yi, Hao Ni, Zhengqi Wen, Bin Liu, Ya Li, Jianhua Tao. (2016). "End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin." Web.
1. Ye Bai, Jiangyan Yi, Hao Ni, Zhengqi Wen, Bin Liu, Ya Li, Jianhua Tao. End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1184

Pages