Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning

Abstract: 

Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments. One approach is the attention-based encoder decoder framework that learns a mapping between variable-length input and output sequences in one step using a purely data-driven method. The attention model has often been shown to improve the performance over another end-to-end approach, the Connectionist Temporal Classification (CTC), mainly because it explicitly uses the history of the target character without any conditional independence assumptions. However, we observed that the performance of the attention has shown poor results in noisy condition and is hard to learn in the initial training stage with long input sequences. This is because the attention model is too flexible to predict proper alignments in such cases due to the lack of left-to-right constraints as used in CTC. This paper presents a novel method for end-to-end speech recognition to improve robustness and achieve fast convergence by using a joint CTC-attention model within the multi-task learning framework, thereby mitigating the alignment issue. An experiment on the WSJ and CHiME-4 tasks demonstrates its advantages over both the CTC and attention-based encoder-decoder baselines, showing 5.4-14.6% relative improvements in Character Error Rate (CER).

up
0 users have voted:

Paper Details

Authors:
Suyoun Kim, Takaaki Hori, Shinji Watanabe
Submitted On:
7 March 2017 - 4:58pm
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
Suyoun Kim
Paper Code:
ICASSP1701
Document Year:
2017
Cite

Document Files

joint ctc attention

(117 downloads)

Subscribe

[1] Suyoun Kim, Takaaki Hori, Shinji Watanabe, "Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning", IEEE SigPort, 2017. [Online]. Available: http://sigport.org/1695. Accessed: Dec. 17, 2017.
@article{1695-17,
url = {http://sigport.org/1695},
author = {Suyoun Kim; Takaaki Hori; Shinji Watanabe },
publisher = {IEEE SigPort},
title = {Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning},
year = {2017} }
TY - EJOUR
T1 - Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning
AU - Suyoun Kim; Takaaki Hori; Shinji Watanabe
PY - 2017
PB - IEEE SigPort
UR - http://sigport.org/1695
ER -
Suyoun Kim, Takaaki Hori, Shinji Watanabe. (2017). Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning. IEEE SigPort. http://sigport.org/1695
Suyoun Kim, Takaaki Hori, Shinji Watanabe, 2017. Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning. Available at: http://sigport.org/1695.
Suyoun Kim, Takaaki Hori, Shinji Watanabe. (2017). "Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning." Web.
1. Suyoun Kim, Takaaki Hori, Shinji Watanabe. Joint CTC-Attention based End-to-End Speech Recognition using Multi-Task Learning [Internet]. IEEE SigPort; 2017. Available from : http://sigport.org/1695