- Read more about CASS-NAT: CTC Alignment-Based Single Step Non-Autoregressive Transformer for Speech Recognition
- Log in to post comments
We propose a CTC alignment-based single step non-autoregressive transformer (CASS-NAT) for speech recognition. Specifically, the CTC alignment contains the information of (a) the number of tokens for decoder input, and (b) the time span of acoustics for each token. The information are used to extract acoustic representation for each token in parallel, referred to as token-level acoustic embedding which substitutes the word embedding in autoregressive transformer (AT) to achieve parallel generation in decoder.
- Categories:
28 Views