
- Read more about Dynamic ASR pathways: An Adaptive Masking Approach Towards Efficient Pruning of a Multilingual ASR Model
- Log in to post comments
Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each language. In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in sparse monolingual models or a sparse multilingual model (named as Dynamic ASR Pathways).
- Categories:

- Read more about Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages
- Log in to post comments
We introduce a new zero resource code-switched speech benchmark designed to directly assess the code-switching capabilities of self-supervised speech encoders. We showcase a baseline system of language modeling on discrete units to demonstrate how the code-switching abilities of speech encoders can be assessed in a zero-resource manner. Our experiments encompass a variety of well-known speech encoders, including Wav2vec 2.0, HuBERT, XLSR, etc. We examine the impact of pre-training languages and model size on benchmark performance.
- Categories:

- Read more about ICASSP 2022 - Improved Language Identification Through Cross-Lingual Self-Supervised Learning
- 1 comment
- Log in to post comments
Language identification greatly impacts the success of downstream tasks such as automatic speech recognition. Recently, self-supervised speech representations learned by wav2vec 2.0 have been shown to be very effective for a range of speech tasks. We extend previous self-supervised work on language identification by experimenting with pre-trained models which were learned on real-world unconstrained speech in multiple languages and not just on English.
Slide_ICASSP_LIDW2V.pdf

- Categories:

- Read more about MULTILINGUAL SECOND-PASS RESCORING FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
- Log in to post comments
- Categories:

- Read more about EXPLORING EFFECTIVE DATA UTILIZATION FOR LOW-RESOURCE SPEECH RECOGNITION
- Log in to post comments
Automatic speech recognition (ASR) has suffered great performance degradation when facing low-resource languages with limited training data. In this work, we propose a series of training strategies to explore more effective data utilization for low-resource speech recognition. In low-resource scenarios, multilingual pretraining is of great help for the above purpose. We exploit relationships among different languages for better pretraining.
- Categories:

- Read more about Language Adaptive Cross-lingual Speech Representation Learning with Sparse Sharing Sub-networks
- Log in to post comments
- Categories:

- Read more about Joint Unsupervised and Supervised Training for Multilingual ASR
- Log in to post comments
Self-supervised training has shown promising gains in pretraining models and facilitating the downstream finetuning for speech recognition, like multilingual ASR. Most existing methods adopt a 2-stage scheme where the self-supervised loss is optimized in the first pretraining stage, and the standard supervised finetuning resumes in the second stage. In this paper, we propose an end-to-end (E2E) Joint Unsupervised and Supervised Training (JUST) method to combine the supervised RNN-T loss and the self-supervised contrastive and masked language modeling (MLM) losses.
poster.pdf

- Categories:

- Read more about Reducing Spelling Inconsistencies in Code-Switching ASR using Contextualized CTC Loss
- Log in to post comments
- Categories:

- Read more about Phoneme Level Language Models for Sequence Based Low Resource ASR
- Log in to post comments
Building multilingual and crosslingual models help bring different languages together in a language universal space. It allows models to share parameters and transfer knowledge across languages, enabling faster and better adaptation to a new language. These approaches are particularly useful for low resource languages. In this paper, we propose a phoneme-level language model that can be used multilingually and for crosslingual adaptation to a target language.
- Categories:

- Read more about Learning from the best: A teacher-student multilingual framework for low-resource languages
- Log in to post comments
The traditional method of pretraining neural acoustic models in low-resource languages consists of initializing the acoustic model parameters with a large, annotated multilingual corpus and can be a drain on time and resources. In an attempt to reuse TDNN-LSTMs already pre-trained using multilingual training, we have applied Teacher-Student (TS) learning as a method of pretraining to transfer knowledge from a multilingual TDNN-LSTM to a TDNN. The pretraining time is reduced by an order of magnitude with the use of language-specific data during the teacher-student training.
- Categories: