Multilingual Recognition and Identification (SPE-MULT)

Dynamic ASR pathways: An Adaptive Masking Approach Towards Efficient Pruning of a Multilingual ASR Model

Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each language. In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in sparse monolingual models or a sparse multilingual model (named as Dynamic ASR Pathways).

IC2024_Jiamin_Poster_Andros_Present_Rev2.pdf

Poster for paper SLP-P21.10: Dynamic ASR pathways: An Adaptive Masking Approach Towards Efficient Pruning of a Multilingual ASR (117)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

18 Views

Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages

We introduce a new zero resource code-switched speech benchmark designed to directly assess the code-switching capabilities of self-supervised speech encoders. We showcase a baseline system of language modeling on discrete units to demonstrate how the code-switching abilities of speech encoders can be assessed in a zero-resource manner. Our experiments encompass a variety of well-known speech encoders, including Wav2vec 2.0, HuBERT, XLSR, etc. We examine the impact of pre-training languages and model size on benchmark performance.

2310.03018.pdf

https://arxiv.org/abs/2310.03018 (176)

Categories:: Audio Processing Systems
Multilingual Recognition and Identification (SPE-MULT)
Language Modeling, for Speech and SLP (SLP-LANG)

20 Views

ICASSP 2022 - Improved Language Identification Through Cross-Lingual Self-Supervised Learning

Language identification greatly impacts the success of downstream tasks such as automatic speech recognition. Recently, self-supervised speech representations learned by wav2vec 2.0 have been shown to be very effective for a range of speech tasks. We extend previous self-supervised work on language identification by experimenting with pre-trained models which were learned on real-world unconstrained speech in multiple languages and not just on English.

Slide_ICASSP_LIDW2V.pdf

Slide (443)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

277 Views

MULTILINGUAL SECOND-PASS RESCORING FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS

Read more about MULTILINGUAL SECOND-PASS RESCORING FOR AUTOMATIC SPEECH RECOGNITION SYSTEMS
Log in to post comments

ICASSP_2022_MLNOS.pdf

ICASSP_2022_MLNOS.pdf (251)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

38 Views

EXPLORING EFFECTIVE DATA UTILIZATION FOR LOW-RESOURCE SPEECH RECOGNITION

Read more about EXPLORING EFFECTIVE DATA UTILIZATION FOR LOW-RESOURCE SPEECH RECOGNITION
Log in to post comments

Automatic speech recognition (ASR) has suffered great performance degradation when facing low-resource languages with limited training data. In this work, we propose a series of training strategies to explore more effective data utilization for low-resource speech recognition. In low-resource scenarios, multilingual pretraining is of great help for the above purpose. We exploit relationships among different languages for better pretraining.

lowres_slides.pdf

lowres_slides.pdf (198)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

14 Views

Language Adaptive Cross-lingual Speech Representation Learning with Sparse Sharing Sub-networks

poster_s3net_yzl_ICASSP2022.pdf

Poster (183)

ppt_s3net_yzl_ICASSP2022.pdf

Slides (210)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

27 Views

Joint Unsupervised and Supervised Training for Multilingual ASR

Read more about Joint Unsupervised and Supervised Training for Multilingual ASR
Log in to post comments

Self-supervised training has shown promising gains in pretraining models and facilitating the downstream finetuning for speech recognition, like multilingual ASR. Most existing methods adopt a 2-stage scheme where the self-supervised loss is optimized in the first pretraining stage, and the standard supervised finetuning resumes in the second stage. In this paper, we propose an end-to-end (E2E) Joint Unsupervised and Supervised Training (JUST) method to combine the supervised RNN-T loss and the self-supervised contrastive and masked language modeling (MLM) losses.

pre.pptx

slides (196)

poster.pdf

poster (175)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

20 Views

Reducing Spelling Inconsistencies in Code-Switching ASR using Contextualized CTC Loss

Read more about Reducing Spelling Inconsistencies in Code-Switching ASR using Contextualized CTC Loss
Log in to post comments

ICASSP_presentation_4th_draft.pdf

PRESENTATION SLIDES (328)

ICASSP_poster-3nd-draft.pdf

POSTER (324)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

21 Views

Phoneme Level Language Models for Sequence Based Low Resource ASR

Read more about Phoneme Level Language Models for Sequence Based Low Resource ASR
Log in to post comments

Building multilingual and crosslingual models help bring different languages together in a language universal space. It allows models to share parameters and transfer knowledge across languages, enabling faster and better adaptation to a new language. These approaches are particularly useful for low resource languages. In this paper, we propose a phoneme-level language model that can be used multilingually and for crosslingual adaptation to a target language.

PLMs_ICASSP_Poster (1).pdf

PLMs_ICASSP_Poster (1).pdf (473)

Categories:: Multilingual Recognition and Identification (SPE-MULT)
Language Modeling, for Speech and SLP (SLP-LANG)

27 Views

Learning from the best: A teacher-student multilingual framework for low-resource languages

The traditional method of pretraining neural acoustic models in low-resource languages consists of initializing the acoustic model parameters with a large, annotated multilingual corpus and can be a drain on time and resources. In an attempt to reuse TDNN-LSTMs already pre-trained using multilingual training, we have applied Teacher-Student (TS) learning as a method of pretraining to transfer knowledge from a multilingual TDNN-LSTM to a TDNN. The pretraining time is reduced by an order of magnitude with the use of language-specific data during the teacher-student training.

ICASSP_2019_Poster.pdf

ICASSP_2019_poster_multi_deblin_bagchi (532)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

49 Views

Multilingual Recognition and Identification (SPE-MULT)

Pages