Sorry, you need to enable JavaScript to visit this website.

Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

Windowed Attention Mechanisms for Speech Recognition

Paper Details

Authors:
Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals
Submitted On:
9 May 2019 - 10:49am
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

Windowed Attention Mechanisms for Speech Recognitiong_poster.pdf

(38)

Subscribe

[1] Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals, "Windowed Attention Mechanisms for Speech Recognition", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4209. Accessed: Sep. 21, 2019.
@article{4209-19,
url = {http://sigport.org/4209},
author = {Shucong Zhang; Erfan Loweimi; Peter Bell; Steve Renals },
publisher = {IEEE SigPort},
title = {Windowed Attention Mechanisms for Speech Recognition},
year = {2019} }
TY - EJOUR
T1 - Windowed Attention Mechanisms for Speech Recognition
AU - Shucong Zhang; Erfan Loweimi; Peter Bell; Steve Renals
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4209
ER -
Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals. (2019). Windowed Attention Mechanisms for Speech Recognition. IEEE SigPort. http://sigport.org/4209
Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals, 2019. Windowed Attention Mechanisms for Speech Recognition. Available at: http://sigport.org/4209.
Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals. (2019). "Windowed Attention Mechanisms for Speech Recognition." Web.
1. Shucong Zhang, Erfan Loweimi, Peter Bell, Steve Renals. Windowed Attention Mechanisms for Speech Recognition [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4209

Knowledge Distillation Using Output Errors for Self-Attention ASR Models


Most automatic speech recognition (ASR) neural network models are not suitable for mobile devices due to their large model sizes. Therefore, it is required to reduce the model size to meet the limited hardware resources. In this study, we investigate sequence-level knowledge distillation techniques of self-attention ASR models for model compression.

Paper Details

Authors:
Hwidong Na, Hoshik Lee, Jihyun Lee, Tae Gyoon Kang, Min-Joong Lee, Young Sang Choi
Submitted On:
8 May 2019 - 10:02pm
Short Link:
Type:
Event:
Presenter's Name:
Document Year:
Cite

Document Files

icassp-2019-poster_v1.1.pptx

(45)

Subscribe

[1] Hwidong Na, Hoshik Lee, Jihyun Lee, Tae Gyoon Kang, Min-Joong Lee, Young Sang Choi, "Knowledge Distillation Using Output Errors for Self-Attention ASR Models", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4140. Accessed: Sep. 21, 2019.
@article{4140-19,
url = {http://sigport.org/4140},
author = {Hwidong Na; Hoshik Lee; Jihyun Lee; Tae Gyoon Kang; Min-Joong Lee; Young Sang Choi },
publisher = {IEEE SigPort},
title = {Knowledge Distillation Using Output Errors for Self-Attention ASR Models},
year = {2019} }
TY - EJOUR
T1 - Knowledge Distillation Using Output Errors for Self-Attention ASR Models
AU - Hwidong Na; Hoshik Lee; Jihyun Lee; Tae Gyoon Kang; Min-Joong Lee; Young Sang Choi
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4140
ER -
Hwidong Na, Hoshik Lee, Jihyun Lee, Tae Gyoon Kang, Min-Joong Lee, Young Sang Choi. (2019). Knowledge Distillation Using Output Errors for Self-Attention ASR Models. IEEE SigPort. http://sigport.org/4140
Hwidong Na, Hoshik Lee, Jihyun Lee, Tae Gyoon Kang, Min-Joong Lee, Young Sang Choi, 2019. Knowledge Distillation Using Output Errors for Self-Attention ASR Models. Available at: http://sigport.org/4140.
Hwidong Na, Hoshik Lee, Jihyun Lee, Tae Gyoon Kang, Min-Joong Lee, Young Sang Choi. (2019). "Knowledge Distillation Using Output Errors for Self-Attention ASR Models." Web.
1. Hwidong Na, Hoshik Lee, Jihyun Lee, Tae Gyoon Kang, Min-Joong Lee, Young Sang Choi. Knowledge Distillation Using Output Errors for Self-Attention ASR Models [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4140

A NEURAL NETWORK BASED RANKING FRAMEWORK TO IMPROVE ASR WITH NLU RELATED KNOWLEDGE DEPLOYED


This work proposes a new neural network framework to simultaneously rank multiple hypotheses generated by one or more automatic speech recognition (ASR) engines for a speech utterance. Features fed in the framework not only include those calculated from the ASR information, but also involve natural language understanding (NLU) related features, such as trigger features capturing long-distance constraints between word/slot pairs and BLSTM features representing intent-sensitive sentence embedding.

Paper Details

Authors:
Zhengyu Zhou, Xuchen Song, Rami Botros, Lin Zhao
Submitted On:
7 May 2019 - 7:57pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Poster_ICASSP2019.pdf

(46)

Subscribe

[1] Zhengyu Zhou, Xuchen Song, Rami Botros, Lin Zhao, "A NEURAL NETWORK BASED RANKING FRAMEWORK TO IMPROVE ASR WITH NLU RELATED KNOWLEDGE DEPLOYED", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/3969. Accessed: Sep. 21, 2019.
@article{3969-19,
url = {http://sigport.org/3969},
author = {Zhengyu Zhou; Xuchen Song; Rami Botros; Lin Zhao },
publisher = {IEEE SigPort},
title = {A NEURAL NETWORK BASED RANKING FRAMEWORK TO IMPROVE ASR WITH NLU RELATED KNOWLEDGE DEPLOYED},
year = {2019} }
TY - EJOUR
T1 - A NEURAL NETWORK BASED RANKING FRAMEWORK TO IMPROVE ASR WITH NLU RELATED KNOWLEDGE DEPLOYED
AU - Zhengyu Zhou; Xuchen Song; Rami Botros; Lin Zhao
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/3969
ER -
Zhengyu Zhou, Xuchen Song, Rami Botros, Lin Zhao. (2019). A NEURAL NETWORK BASED RANKING FRAMEWORK TO IMPROVE ASR WITH NLU RELATED KNOWLEDGE DEPLOYED. IEEE SigPort. http://sigport.org/3969
Zhengyu Zhou, Xuchen Song, Rami Botros, Lin Zhao, 2019. A NEURAL NETWORK BASED RANKING FRAMEWORK TO IMPROVE ASR WITH NLU RELATED KNOWLEDGE DEPLOYED. Available at: http://sigport.org/3969.
Zhengyu Zhou, Xuchen Song, Rami Botros, Lin Zhao. (2019). "A NEURAL NETWORK BASED RANKING FRAMEWORK TO IMPROVE ASR WITH NLU RELATED KNOWLEDGE DEPLOYED." Web.
1. Zhengyu Zhou, Xuchen Song, Rami Botros, Lin Zhao. A NEURAL NETWORK BASED RANKING FRAMEWORK TO IMPROVE ASR WITH NLU RELATED KNOWLEDGE DEPLOYED [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/3969

Word Characters and Phone Pronunciation Embedding for ASR Confidence Classifier


Confidences are integral to ASR systems, and applied to data selection, adaptation, ranking hypotheses, arbitration etc.Hybrid ASR system is inherently a match between pronunciations and AM+LM evidence but current confidence features lack pronunciation information. We develop pronunciation embeddings to represent and factorize acoustic score in relevant bases, and demonstrate 8-10% relative reduction in false alarm (FA) on large scale tasks. We generalize to standard NLP embeddings like Glove, and show 16% relative reduction in FA in combination with Glove.

Paper Details

Authors:
Kshitiz Kumar, Tasos Anastasakos, Yifan Gong
Submitted On:
7 May 2019 - 3:33pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

WordEmbed_v5.pdf

(38)

Subscribe

[1] Kshitiz Kumar, Tasos Anastasakos, Yifan Gong, "Word Characters and Phone Pronunciation Embedding for ASR Confidence Classifier", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/3953. Accessed: Sep. 21, 2019.
@article{3953-19,
url = {http://sigport.org/3953},
author = {Kshitiz Kumar; Tasos Anastasakos; Yifan Gong },
publisher = {IEEE SigPort},
title = {Word Characters and Phone Pronunciation Embedding for ASR Confidence Classifier},
year = {2019} }
TY - EJOUR
T1 - Word Characters and Phone Pronunciation Embedding for ASR Confidence Classifier
AU - Kshitiz Kumar; Tasos Anastasakos; Yifan Gong
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/3953
ER -
Kshitiz Kumar, Tasos Anastasakos, Yifan Gong. (2019). Word Characters and Phone Pronunciation Embedding for ASR Confidence Classifier. IEEE SigPort. http://sigport.org/3953
Kshitiz Kumar, Tasos Anastasakos, Yifan Gong, 2019. Word Characters and Phone Pronunciation Embedding for ASR Confidence Classifier. Available at: http://sigport.org/3953.
Kshitiz Kumar, Tasos Anastasakos, Yifan Gong. (2019). "Word Characters and Phone Pronunciation Embedding for ASR Confidence Classifier." Web.
1. Kshitiz Kumar, Tasos Anastasakos, Yifan Gong. Word Characters and Phone Pronunciation Embedding for ASR Confidence Classifier [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/3953

Can DNNs Learn to Lipread Full Sentences ?


Finding visual features and suitable models for lipreading tasks that are more complex than a well-constrained vocabulary has proven challenging. This paper explores state-of-the-art Deep Neural Network architectures for lipreading based on a Sequence to Sequence Recurrent Neural Network. We report results for both hand-crafted and 2D/3D Convolutional Neural Network visual front-ends, online monotonic attention, and a joint Connectionist Temporal Classification-Sequence-to-Sequence loss.

Paper Details

Authors:
George Sterpu, Christian Saam, Naomi Harte
Submitted On:
8 October 2018 - 1:50am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

slides.pdf

(64)

Subscribe

[1] George Sterpu, Christian Saam, Naomi Harte, "Can DNNs Learn to Lipread Full Sentences ?", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3608. Accessed: Sep. 21, 2019.
@article{3608-18,
url = {http://sigport.org/3608},
author = {George Sterpu; Christian Saam; Naomi Harte },
publisher = {IEEE SigPort},
title = {Can DNNs Learn to Lipread Full Sentences ?},
year = {2018} }
TY - EJOUR
T1 - Can DNNs Learn to Lipread Full Sentences ?
AU - George Sterpu; Christian Saam; Naomi Harte
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3608
ER -
George Sterpu, Christian Saam, Naomi Harte. (2018). Can DNNs Learn to Lipread Full Sentences ?. IEEE SigPort. http://sigport.org/3608
George Sterpu, Christian Saam, Naomi Harte, 2018. Can DNNs Learn to Lipread Full Sentences ?. Available at: http://sigport.org/3608.
George Sterpu, Christian Saam, Naomi Harte. (2018). "Can DNNs Learn to Lipread Full Sentences ?." Web.
1. George Sterpu, Christian Saam, Naomi Harte. Can DNNs Learn to Lipread Full Sentences ? [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3608

ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM


This paper presents methods to accelerate recurrent neural network based language models (RNNLMs) for online speech recognition systems.
Firstly, a lossy compression of the past hidden layer outputs (history vector) with caching is introduced in order to reduce the number of LM queries.
Next, RNNLM computations are deployed in a CPU-GPU hybrid manner, which computes each layer of the model on a more advantageous platform.
The added overhead by data exchanges between CPU and GPU is compensated through a frame-wise batching strategy.

Paper Details

Authors:
Chiyoun Park, Namhoon Kim, Jaewon Lee
Submitted On:
26 April 2018 - 1:10am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Icassp2018_KML_20180402_poster.pdf

(185)

Subscribe

[1] Chiyoun Park, Namhoon Kim, Jaewon Lee, "ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3183. Accessed: Sep. 21, 2019.
@article{3183-18,
url = {http://sigport.org/3183},
author = {Chiyoun Park; Namhoon Kim; Jaewon Lee },
publisher = {IEEE SigPort},
title = {ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM},
year = {2018} }
TY - EJOUR
T1 - ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM
AU - Chiyoun Park; Namhoon Kim; Jaewon Lee
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3183
ER -
Chiyoun Park, Namhoon Kim, Jaewon Lee. (2018). ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM. IEEE SigPort. http://sigport.org/3183
Chiyoun Park, Namhoon Kim, Jaewon Lee, 2018. ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM. Available at: http://sigport.org/3183.
Chiyoun Park, Namhoon Kim, Jaewon Lee. (2018). "ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM." Web.
1. Chiyoun Park, Namhoon Kim, Jaewon Lee. ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3183

OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018


The paper presents a new approach to extracting useful information from out-of-vocabulary (OOV) speech regions in ASR system output. The system makes use of a hybrid decoding network with both words and sub-word units. In the decoded lattices, candidates for OOV regions are identified

Paper Details

Authors:
Ekaterina Egorova, Ekaterina Egorova
Submitted On:
24 April 2018 - 10:23am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Egorova_poster (1).pdf

(38)

Subscribe

[1] Ekaterina Egorova, Ekaterina Egorova , "OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3168. Accessed: Sep. 21, 2019.
@article{3168-18,
url = {http://sigport.org/3168},
author = {Ekaterina Egorova; Ekaterina Egorova },
publisher = {IEEE SigPort},
title = {OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018},
year = {2018} }
TY - EJOUR
T1 - OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018
AU - Ekaterina Egorova; Ekaterina Egorova
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3168
ER -
Ekaterina Egorova, Ekaterina Egorova . (2018). OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018. IEEE SigPort. http://sigport.org/3168
Ekaterina Egorova, Ekaterina Egorova , 2018. OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018. Available at: http://sigport.org/3168.
Ekaterina Egorova, Ekaterina Egorova . (2018). "OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018." Web.
1. Ekaterina Egorova, Ekaterina Egorova . OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018 [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3168

A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition


Lattice-rescoring is a common approach to take advantage of recurrent neural language models in ASR, where a wordlattice is generated from 1st-pass decoding and the lattice is then rescored with a neural model, and an n-gram approximation method is usually adopted to limit the search space. In this work, we describe a pruned lattice-rescoring algorithm for ASR, improving the n-gram approximation method. The pruned algorithm further limits the search space and uses heuristic search to pick better histories when expanding the lattice.

Paper Details

Authors:
Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey, Sanjeev Khudanpur
Submitted On:
19 April 2018 - 11:47pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

lattice-rescoring-poster.pdf

(157)

Subscribe

[1] Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey, Sanjeev Khudanpur, "A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/3063. Accessed: Sep. 21, 2019.
@article{3063-18,
url = {http://sigport.org/3063},
author = {Hainan Xu; Tongfei Chen; Dongji Gao; Yiming Wang; Ke Li; Nagendra Goel; Yishay Carmiel; Daniel Povey; Sanjeev Khudanpur },
publisher = {IEEE SigPort},
title = {A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition},
year = {2018} }
TY - EJOUR
T1 - A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition
AU - Hainan Xu; Tongfei Chen; Dongji Gao; Yiming Wang; Ke Li; Nagendra Goel; Yishay Carmiel; Daniel Povey; Sanjeev Khudanpur
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/3063
ER -
Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey, Sanjeev Khudanpur. (2018). A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition. IEEE SigPort. http://sigport.org/3063
Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey, Sanjeev Khudanpur, 2018. A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition. Available at: http://sigport.org/3063.
Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey, Sanjeev Khudanpur. (2018). "A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition." Web.
1. Hainan Xu, Tongfei Chen, Dongji Gao, Yiming Wang, Ke Li, Nagendra Goel, Yishay Carmiel, Daniel Povey, Sanjeev Khudanpur. A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/3063

SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER

Paper Details

Authors:
HIROSHI FUJIMURA, MANABU NAGAO, TAKASHI MASUKO
Submitted On:
16 April 2018 - 2:30am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2018_simultaneous__poster_final.pdf

(221)

Subscribe

[1] HIROSHI FUJIMURA, MANABU NAGAO, TAKASHI MASUKO, "SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2908. Accessed: Sep. 21, 2019.
@article{2908-18,
url = {http://sigport.org/2908},
author = {HIROSHI FUJIMURA; MANABU NAGAO; TAKASHI MASUKO },
publisher = {IEEE SigPort},
title = {SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER},
year = {2018} }
TY - EJOUR
T1 - SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER
AU - HIROSHI FUJIMURA; MANABU NAGAO; TAKASHI MASUKO
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2908
ER -
HIROSHI FUJIMURA, MANABU NAGAO, TAKASHI MASUKO. (2018). SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER. IEEE SigPort. http://sigport.org/2908
HIROSHI FUJIMURA, MANABU NAGAO, TAKASHI MASUKO, 2018. SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER. Available at: http://sigport.org/2908.
HIROSHI FUJIMURA, MANABU NAGAO, TAKASHI MASUKO. (2018). "SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER." Web.
1. HIROSHI FUJIMURA, MANABU NAGAO, TAKASHI MASUKO. SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2908

A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster)


Connectionist temporal classification (CTC) is a popular sequence prediction approach for automatic speech recognition that is typically used with models based on recurrent neural networks (RNNs). We explore whether deep convolutional neural networks (CNNs) can be used effectively instead of RNNs as the "encoder" in CTC. CNNs lack an explicit representation of the entire sequence, but have the advantage that they are much faster to train. We present an exploration of CNNs as encoders for CTC models, in the context of character-based (lexicon-free) automatic speech recognition.

Paper Details

Authors:
Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu
Submitted On:
14 April 2018 - 6:13am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

study-convolutional-encoders.pdf

(146)

Subscribe

[1] Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu, "A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster)", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2816. Accessed: Sep. 21, 2019.
@article{2816-18,
url = {http://sigport.org/2816},
author = {Kalpesh Krishna; Liang Lu; Kevin Gimpel; Karen Livescu },
publisher = {IEEE SigPort},
title = {A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster)},
year = {2018} }
TY - EJOUR
T1 - A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster)
AU - Kalpesh Krishna; Liang Lu; Kevin Gimpel; Karen Livescu
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2816
ER -
Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu. (2018). A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster). IEEE SigPort. http://sigport.org/2816
Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu, 2018. A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster). Available at: http://sigport.org/2816.
Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu. (2018). "A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster)." Web.
1. Kalpesh Krishna, Liang Lu, Kevin Gimpel, Karen Livescu. A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster) [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2816

Pages