Self-Supervised Learning; Speech Representation; Pre-training

BRAVEn: Improving Self-supervised Pre-training for Visual and Auditory Speech Recognition

Self-supervision has recently shown great promise for learning visual and auditory speech representations from unlabelled data. In this work, we propose BRAVEn, an extension to the recent RAVEn method, which learns speech representations entirely from raw audio-visual data. Our modifications to RAVEn enable BRAVEn to achieve state-of-the-art results among self-supervised methods in various settings. Moreover, we observe favourable scaling behaviour by increasing the amount of unlabelled data well beyond other self-supervised works.

icassp slides.pptx

icassp slides.pptx (306)

Categories:: General Topics in Speech Recognition (SPE-GASR)

34 Views

WAV2VEC-SWITCH: CONTRASTIVE LEARNING FROM ORIGINAL-NOISY SPEECH PAIRS FOR ROBUST SPEECH RECOGNITION

The goal of self-supervised learning (SSL) for automatic speech recognition (ASR) is to learn good speech representations from a large amount of unlabeled speech for the downstream ASR task. However, most SSL frameworks do not consider noise robustness which is crucial for real-world applications. In this paper we propose wav2vec-Switch, a method to encode noise robustness into contextualized representations of speech via contrastive learning. Specifically, we feed original-noisy speech pairs simultaneously into the wav2vec 2.0 network.

ICASSP2022_poster.pdf

ICASSP2022_poster.pdf (459)

Categories:: General Topics in Speech Recognition (SPE-GASR)
Robust Speech Recognition (SPE-ROBU)

34 Views