end-to-end speech recognition

CASS-NAT: CTC Alignment-Based Single Step Non-Autoregressive Transformer for Speech Recognition

Read more about CASS-NAT: CTC Alignment-Based Single Step Non-Autoregressive Transformer for Speech Recognition
Log in to post comments

We propose a CTC alignment-based single step non-autoregressive transformer (CASS-NAT) for speech recognition. Specifically, the CTC alignment contains the information of (a) the number of tokens for decoder input, and (b) the time span of acoustics for each token. The information are used to extract acoustic representation for each token in parallel, referred to as token-level acoustic embedding which substitutes the word embedding in autoregressive transformer (AT) to achieve parallel generation in decoder.

cassnat_poster.pdf

cassnat_poster.pdf (259)

Categories:: General Topics in Speech Recognition (SPE-GASR)

28 Views

Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition

Read more about Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition
Log in to post comments

rnnt_icassp2020_.pptx

rnnt_icassp2020_.pptx (285)

Categories:: General Topics in Speech Recognition (SPE-GASR)

8 Views