New algorithms and approaches for speech recognition

Unimodal Aggregation for CTC-based Speech Recognition

Read more about Unimodal Aggregation for CTC-based Speech Recognition
Log in to post comments

This paper works on non-autoregressive automatic speech recognition. A unimodal aggregation (UMA) is proposed to segment and integrate the feature frames that belong to the same text token, and thus to learn better feature representations for text tokens. The frame-wise features and weights are both derived from an encoder. Then, the feature frames with unimodal weights are integrated and further processed by a decoder. Connectionist temporal classification (CTC) loss is applied for training.

fangying_UMA_poster4.0.pdf

UMA Poster for ICASSP 2024 (307)

Categories:: General Topics in Speech Recognition (SPE-GASR)

31 Views