Sorry, you need to enable JavaScript to visit this website.

Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

ADVANCING CONNECTIONIST TEMPORAL CLASSIFICATION WITH ATTENTION MODELING


In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework. In particular, we derive new context vectors using time convolution features to model attention as part of the CTC network. To further improve attention modeling, we utilize content information extracted from a network representing an implicit language model. Finally, we introduce vector based attention weights that are applied on context vectors across both time and their individual components.

Paper Details

Authors:
Amit Das, Jinyu Li, Rui Zhao, Yifan Gong
Submitted On:
13 April 2018 - 7:29pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ctc_attention_slides.pdf

(177)

Subscribe

[1] Amit Das, Jinyu Li, Rui Zhao, Yifan Gong, "ADVANCING CONNECTIONIST TEMPORAL CLASSIFICATION WITH ATTENTION MODELING", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2773. Accessed: Jun. 19, 2019.
@article{2773-18,
url = {http://sigport.org/2773},
author = {Amit Das; Jinyu Li; Rui Zhao; Yifan Gong },
publisher = {IEEE SigPort},
title = {ADVANCING CONNECTIONIST TEMPORAL CLASSIFICATION WITH ATTENTION MODELING},
year = {2018} }
TY - EJOUR
T1 - ADVANCING CONNECTIONIST TEMPORAL CLASSIFICATION WITH ATTENTION MODELING
AU - Amit Das; Jinyu Li; Rui Zhao; Yifan Gong
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2773
ER -
Amit Das, Jinyu Li, Rui Zhao, Yifan Gong. (2018). ADVANCING CONNECTIONIST TEMPORAL CLASSIFICATION WITH ATTENTION MODELING. IEEE SigPort. http://sigport.org/2773
Amit Das, Jinyu Li, Rui Zhao, Yifan Gong, 2018. ADVANCING CONNECTIONIST TEMPORAL CLASSIFICATION WITH ATTENTION MODELING. Available at: http://sigport.org/2773.
Amit Das, Jinyu Li, Rui Zhao, Yifan Gong. (2018). "ADVANCING CONNECTIONIST TEMPORAL CLASSIFICATION WITH ATTENTION MODELING." Web.
1. Amit Das, Jinyu Li, Rui Zhao, Yifan Gong. ADVANCING CONNECTIONIST TEMPORAL CLASSIFICATION WITH ATTENTION MODELING [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2773

REINFORCEMENT LEARNING OF SPEECH RECOGNITION SYSTEM BASED ON POLICY GRADIENT AND HYPOTHESIS SELECTION

Paper Details

Authors:
Submitted On:
13 April 2018 - 6:18am
Short Link:
Type:
Event:

Document Files

ICASSP2018_poster_tkato.pdf

(121)

Subscribe

[1] , "REINFORCEMENT LEARNING OF SPEECH RECOGNITION SYSTEM BASED ON POLICY GRADIENT AND HYPOTHESIS SELECTION", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2680. Accessed: Jun. 19, 2019.
@article{2680-18,
url = {http://sigport.org/2680},
author = { },
publisher = {IEEE SigPort},
title = {REINFORCEMENT LEARNING OF SPEECH RECOGNITION SYSTEM BASED ON POLICY GRADIENT AND HYPOTHESIS SELECTION},
year = {2018} }
TY - EJOUR
T1 - REINFORCEMENT LEARNING OF SPEECH RECOGNITION SYSTEM BASED ON POLICY GRADIENT AND HYPOTHESIS SELECTION
AU -
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2680
ER -
. (2018). REINFORCEMENT LEARNING OF SPEECH RECOGNITION SYSTEM BASED ON POLICY GRADIENT AND HYPOTHESIS SELECTION. IEEE SigPort. http://sigport.org/2680
, 2018. REINFORCEMENT LEARNING OF SPEECH RECOGNITION SYSTEM BASED ON POLICY GRADIENT AND HYPOTHESIS SELECTION. Available at: http://sigport.org/2680.
. (2018). "REINFORCEMENT LEARNING OF SPEECH RECOGNITION SYSTEM BASED ON POLICY GRADIENT AND HYPOTHESIS SELECTION." Web.
1. . REINFORCEMENT LEARNING OF SPEECH RECOGNITION SYSTEM BASED ON POLICY GRADIENT AND HYPOTHESIS SELECTION [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2680

SEMI-SUPERVISED TRAINING OF ACOUSTIC MODELS USING LATTICE-FREE MMI


The lattice-free MMI objective (LF-MMI) has been used in supervised training of
state-of-the-art neural network acoustic models for automatic speech
recognition (ASR). With large amounts of unsupervised data available,
extending this approach to the semi-supervised scenario is of significance.
Finite-state transducer (FST) based supervision used with LF-MMI provides a
natural way to incorporate uncertainties when dealing with unsupervised data.
In this paper,
we describe various extensions to standard LF-MMI training to allow the use

Paper Details

Authors:
Vimal Manohar, Hossein Hadian, Daniel Povey, Sanjeev Khudanpur
Submitted On:
18 April 2018 - 12:19pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

talk.pdf

(118)

Subscribe

[1] Vimal Manohar, Hossein Hadian, Daniel Povey, Sanjeev Khudanpur, "SEMI-SUPERVISED TRAINING OF ACOUSTIC MODELS USING LATTICE-FREE MMI", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2645. Accessed: Jun. 19, 2019.
@article{2645-18,
url = {http://sigport.org/2645},
author = {Vimal Manohar; Hossein Hadian; Daniel Povey; Sanjeev Khudanpur },
publisher = {IEEE SigPort},
title = {SEMI-SUPERVISED TRAINING OF ACOUSTIC MODELS USING LATTICE-FREE MMI},
year = {2018} }
TY - EJOUR
T1 - SEMI-SUPERVISED TRAINING OF ACOUSTIC MODELS USING LATTICE-FREE MMI
AU - Vimal Manohar; Hossein Hadian; Daniel Povey; Sanjeev Khudanpur
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2645
ER -
Vimal Manohar, Hossein Hadian, Daniel Povey, Sanjeev Khudanpur. (2018). SEMI-SUPERVISED TRAINING OF ACOUSTIC MODELS USING LATTICE-FREE MMI. IEEE SigPort. http://sigport.org/2645
Vimal Manohar, Hossein Hadian, Daniel Povey, Sanjeev Khudanpur, 2018. SEMI-SUPERVISED TRAINING OF ACOUSTIC MODELS USING LATTICE-FREE MMI. Available at: http://sigport.org/2645.
Vimal Manohar, Hossein Hadian, Daniel Povey, Sanjeev Khudanpur. (2018). "SEMI-SUPERVISED TRAINING OF ACOUSTIC MODELS USING LATTICE-FREE MMI." Web.
1. Vimal Manohar, Hossein Hadian, Daniel Povey, Sanjeev Khudanpur. SEMI-SUPERVISED TRAINING OF ACOUSTIC MODELS USING LATTICE-FREE MMI [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2645

Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training

Paper Details

Authors:
Submitted On:
12 April 2018 - 9:11pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

liubin-ICASSP2018.pptx

(156)

Subscribe

[1] , "Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2536. Accessed: Jun. 19, 2019.
@article{2536-18,
url = {http://sigport.org/2536},
author = { },
publisher = {IEEE SigPort},
title = {Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training},
year = {2018} }
TY - EJOUR
T1 - Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training
AU -
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2536
ER -
. (2018). Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training. IEEE SigPort. http://sigport.org/2536
, 2018. Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training. Available at: http://sigport.org/2536.
. (2018). "Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training." Web.
1. . Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2536

Advancing Acoustic-to-Word CTC Model


The acoustic-to-word model based on the connectionist temporal classification (CTC) criterion was shown as a natural end-to-end (E2E) model directly targeting words as output units. However, the word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node. Hence, such a word-based CTC model can only recognize the frequent words modeled by the network output nodes.

Paper Details

Authors:
Submitted On:
12 April 2018 - 3:12pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

AdvanceCTC_poster.pdf

(171)

Subscribe

[1] , "Advancing Acoustic-to-Word CTC Model", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2474. Accessed: Jun. 19, 2019.
@article{2474-18,
url = {http://sigport.org/2474},
author = { },
publisher = {IEEE SigPort},
title = {Advancing Acoustic-to-Word CTC Model},
year = {2018} }
TY - EJOUR
T1 - Advancing Acoustic-to-Word CTC Model
AU -
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2474
ER -
. (2018). Advancing Acoustic-to-Word CTC Model. IEEE SigPort. http://sigport.org/2474
, 2018. Advancing Acoustic-to-Word CTC Model. Available at: http://sigport.org/2474.
. (2018). "Advancing Acoustic-to-Word CTC Model." Web.
1. . Advancing Acoustic-to-Word CTC Model [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2474

DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING


In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system. Specifically, we use teacher-student (T/S) learning to adapt a close-talk well-trained production AM to far-field by using parallel close-talk and simulated far-field data. We also use T/S learning to compress a large-size KWS model into a small-size one to fit the device computational cost. Without the need of transcription, T/S learning well utilizes untranscribed data to boost the model performance in both the AM adaptation and KWS model compression.

Paper Details

Authors:
Submitted On:
12 April 2018 - 3:03pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

speaker_poster.pdf

(113)

Subscribe

[1] , "DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2473. Accessed: Jun. 19, 2019.
@article{2473-18,
url = {http://sigport.org/2473},
author = { },
publisher = {IEEE SigPort},
title = {DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING},
year = {2018} }
TY - EJOUR
T1 - DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING
AU -
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2473
ER -
. (2018). DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING. IEEE SigPort. http://sigport.org/2473
, 2018. DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING. Available at: http://sigport.org/2473.
. (2018). "DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING." Web.
1. . DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2473

Domain Adversarial Training for Accented Speech Recgnition


In this paper, we propose a domain adversarial training (DAT) algorithm to alleviate the accented speech recognition problem. In order to reduce the mismatch between labeled source domain data (“standard” accent) and unlabeled target domain data (with heavy accents), we augment the learning objective for a Kaldi TDNN network with a domain adversarial training (DAT) objective to encourage the model to learn accent-invariant features.

Paper Details

Authors:
Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie
Submitted On:
17 April 2018 - 4:42pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

icassp_slides_snsun_v6.pdf

(250)

Subscribe

[1] Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie, "Domain Adversarial Training for Accented Speech Recgnition", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2467. Accessed: Jun. 19, 2019.
@article{2467-18,
url = {http://sigport.org/2467},
author = {Ching-Feng Yeh; Mei-Yuh Hwang; Mari Ostendorf; Lei Xie },
publisher = {IEEE SigPort},
title = {Domain Adversarial Training for Accented Speech Recgnition},
year = {2018} }
TY - EJOUR
T1 - Domain Adversarial Training for Accented Speech Recgnition
AU - Ching-Feng Yeh; Mei-Yuh Hwang; Mari Ostendorf; Lei Xie
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2467
ER -
Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie. (2018). Domain Adversarial Training for Accented Speech Recgnition. IEEE SigPort. http://sigport.org/2467
Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie, 2018. Domain Adversarial Training for Accented Speech Recgnition. Available at: http://sigport.org/2467.
Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie. (2018). "Domain Adversarial Training for Accented Speech Recgnition." Web.
1. Ching-Feng Yeh, Mei-Yuh Hwang, Mari Ostendorf, Lei Xie. Domain Adversarial Training for Accented Speech Recgnition [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2467

On Modular Training of Neural Acoustics-to-Word Model for LVCSR


End-to-end (E2E) automatic speech recognition (ASR) systems directly map acoustics to words using a unified model. Previous works
mostly focus on E2E training a single model which integrates acoustic and language model into a whole. Although E2E training benefits
from sequence modeling and simplified decoding pipelines, large
amount of transcribed acoustic data is usually required, and traditional acoustic and language modelling techniques cannot be utilized. In this paper, a novel modular training framework of E2E ASR

Paper Details

Authors:
Submitted On:
12 April 2018 - 12:34pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

e2e icassp2018 oral slides_zhc00.pdf

(14)

Subscribe

[1] , "On Modular Training of Neural Acoustics-to-Word Model for LVCSR", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2434. Accessed: Jun. 19, 2019.
@article{2434-18,
url = {http://sigport.org/2434},
author = { },
publisher = {IEEE SigPort},
title = {On Modular Training of Neural Acoustics-to-Word Model for LVCSR},
year = {2018} }
TY - EJOUR
T1 - On Modular Training of Neural Acoustics-to-Word Model for LVCSR
AU -
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2434
ER -
. (2018). On Modular Training of Neural Acoustics-to-Word Model for LVCSR. IEEE SigPort. http://sigport.org/2434
, 2018. On Modular Training of Neural Acoustics-to-Word Model for LVCSR. Available at: http://sigport.org/2434.
. (2018). "On Modular Training of Neural Acoustics-to-Word Model for LVCSR." Web.
1. . On Modular Training of Neural Acoustics-to-Word Model for LVCSR [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2434

High Order Recurrent Neural Networks for Acoustic Modelling


Vanishing long-term gradients are a major issue in training standard recurrent neural networks (RNNs), which can be alleviated by long short-term memory (LSTM) models with memory cells. However, the extra parameters associated with the memory cells mean an LSTM layer has four times as many parameters as an RNN with the same hidden vector size. This paper addresses the vanishing gradient problem using a high order RNN (HORNN) which has additional connections from multiple previous time steps.

Paper Details

Authors:
Chao Zhang, Phil Woodland
Submitted On:
12 April 2018 - 12:16pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

cz277-ICASSP18-Poster-v3.pdf

(162)

Subscribe

[1] Chao Zhang, Phil Woodland, "High Order Recurrent Neural Networks for Acoustic Modelling", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2429. Accessed: Jun. 19, 2019.
@article{2429-18,
url = {http://sigport.org/2429},
author = {Chao Zhang; Phil Woodland },
publisher = {IEEE SigPort},
title = {High Order Recurrent Neural Networks for Acoustic Modelling},
year = {2018} }
TY - EJOUR
T1 - High Order Recurrent Neural Networks for Acoustic Modelling
AU - Chao Zhang; Phil Woodland
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2429
ER -
Chao Zhang, Phil Woodland. (2018). High Order Recurrent Neural Networks for Acoustic Modelling. IEEE SigPort. http://sigport.org/2429
Chao Zhang, Phil Woodland, 2018. High Order Recurrent Neural Networks for Acoustic Modelling. Available at: http://sigport.org/2429.
Chao Zhang, Phil Woodland. (2018). "High Order Recurrent Neural Networks for Acoustic Modelling." Web.
1. Chao Zhang, Phil Woodland. High Order Recurrent Neural Networks for Acoustic Modelling [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2429

Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning


Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments. One approach is the attention-based encoder decoder framework that learns a mapping between variable-length input and output sequences in one step using a purely data-driven method.

Paper Details

Authors:
Suyoun Kim, Takaaki Hori, Shinji Watanabe
Submitted On:
7 March 2017 - 4:58pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

joint ctc attention

(300)

Subscribe

[1] Suyoun Kim, Takaaki Hori, Shinji Watanabe, "Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning", IEEE SigPort, 2017. [Online]. Available: http://sigport.org/1695. Accessed: Jun. 19, 2019.
@article{1695-17,
url = {http://sigport.org/1695},
author = {Suyoun Kim; Takaaki Hori; Shinji Watanabe },
publisher = {IEEE SigPort},
title = {Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning},
year = {2017} }
TY - EJOUR
T1 - Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning
AU - Suyoun Kim; Takaaki Hori; Shinji Watanabe
PY - 2017
PB - IEEE SigPort
UR - http://sigport.org/1695
ER -
Suyoun Kim, Takaaki Hori, Shinji Watanabe. (2017). Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning. IEEE SigPort. http://sigport.org/1695
Suyoun Kim, Takaaki Hori, Shinji Watanabe, 2017. Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning. Available at: http://sigport.org/1695.
Suyoun Kim, Takaaki Hori, Shinji Watanabe. (2017). "Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning." Web.
1. Suyoun Kim, Takaaki Hori, Shinji Watanabe. Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning [Internet]. IEEE SigPort; 2017. Available from : http://sigport.org/1695

Pages