Resource constrained speech recognition (SPE-RCSR)

On-Device Constrained Self-Supervised Learning for Keyword Spotting via Quantization Aware Pre-Training and Fine-tuning

Large self-supervised models have excelled in various speech processing tasks, but their deployment on resource-limited devices is often impractical due to their substantial memory footprint. Previous studies have demonstrated the effectiveness of self-supervised pre-training for keyword spotting, even with constrained model capacity.

final_v5.pdf

final_v5.pdf (237)

Categories:: Resource constrained speech recognition (SPE-RCSR)

32 Views

Are Soft prompts good zero-shot learners for speech recognition?

Read more about Are Soft prompts good zero-shot learners for speech recognition?
Log in to post comments

Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understanding of this emerging method by investigating the role of soft prompts in automatic speech recognition (ASR).

ICASSP2024_dianwen_oral_prompts.pptx

ICASSP2024_dianwen_oral_prompts.pptx (163)

Categories:: Resource constrained speech recognition (SPE-RCSR)
Robust Speech Recognition (SPE-ROBU)

23 Views

Towards Better Meta-Initialization with Task Augmentation for Kindergarten-aged Speech Recognition

Children's automatic speech recognition (ASR) is always difficult due to, in part, the data scarcity problem, especially for kindergarten-aged kids. When data are scarce, the model might overfit to the training data, and hence good starting points for training are essential. Recently, meta-learning was proposed to learn model initialization (MI) for ASR tasks of different languages. This method leads to good performance when the model is adapted to an unseen language. However, MI is vulnerable to overfitting on training tasks (learner overfitting).

Towards Better Meta-Initialization with Task Augmentation for Kindergarten-aged speech Recognition - Poster.pdf

Poster (200)

Categories:: Resource constrained speech recognition (SPE-RCSR)

5 Views

IMPROVED META LEARNING FOR LOW RESOURCE SPEECH RECOGNITION

Read more about IMPROVED META LEARNING FOR LOW RESOURCE SPEECH RECOGNITION
Log in to post comments

We propose a new meta learning based framework for low resource speech recognition that improves the previous model agnostic meta learning (MAML) approach. The MAML is a simple yet powerful meta learning approach. However, the MAML presents some core deficiencies such as training instabilities and slower convergence speed. To address these issues, we adopt multi-step loss (MSL). The MSL aims to calculate losses at every step of the inner loop of MAML and then combines them with a weighted importance vector.

ICASPP-2022-v3.pdf

Presentation (224)

Categories:: Resource constrained speech recognition (SPE-RCSR)

23 Views

Punctuation Prediction for Streaming On-Device Speech Recognition

Read more about Punctuation Prediction for Streaming On-Device Speech Recognition
Log in to post comments

Punctuation prediction is essential for automatic speech recognition (ASR). Although many works have been proposed for punctuation prediction, the on-device scenarios are rarely discussed with an end-to-end ASR. The punctuation prediction task is often treated as a post-processing of ASR outputs, but the mismatch between natural language in training input and ASR hypotheses in testing is ignored. Besides, language models built with deep neural networks are too large for edge devices.

punc_slides.pdf

punc_slides.pdf (385)

Categories:: Resource constrained speech recognition (SPE-RCSR)

14 Views

Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning

Read more about Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning
Log in to post comments

Self- and semi-supervised learning methods have been actively investigated to reduce labeled training data or enhance the model performance. However, the approach mostly focus on in-domain performance for public datasets. In this study, we utilize the combination of self- and semi-supervised learning methods to solve unseen domain adaptation problem in a large-scale production setting for online ASR model.

[Presentation] ICASSP 2022 Domain Adaptation.pdf

ICASSP 2022 presentation (265)

Poster ICASSP22.pdf

Poster ICASSP22.pdf (244)

Categories:: Resource constrained speech recognition (SPE-RCSR)
General Topics in Speech Recognition (SPE-GASR)

30 Views

BI-APC: BIDIRECTIONAL AUTOREGRESSIVE PREDICTIVE CODING FOR UNSUPERVISED PRE-TRAINING AND ITS APPLICATION TO CHILDREN’S ASR

We present a bidirectional unsupervised model pre-training (UPT) method and apply it to children’s automatic speech recognition (ASR). An obstacle to improving child ASR is the scarcity of child speech databases. A common approach to alleviate this problem is model pre-training using data from adult speech. Pre-training can be done using supervised (SPT) or unsupervised methods, depending on the availability of annotations. Typically, SPT performs better. In this paper, we focus on UPT to address the situations when pre-training data are unlabeled.

Bi-APC poster for icassp2021.pdf

poster (295)

Categories:: Resource constrained speech recognition (SPE-RCSR)

16 Views

Libri-Light: A Benchmark for ASR with Limited or No Supervision- ICASSP 2020 Slides

Read more about Libri-Light: A Benchmark for ASR with Limited or No Supervision- ICASSP 2020 Slides
Log in to post comments

Libri-Light - A Benchmark for ASR with Limited or No Supervision -- ICASSP 2020.pdf

Libri-Light - A Benchmark for ASR with Limited or No Supervision -- ICASSP 2020.pdf (1895)

Categories:: Resource constrained speech recognition (SPE-RCSR)

162 Views

SPEECH RECOGNITION MODEL COMPRESSION

Read more about SPEECH RECOGNITION MODEL COMPRESSION
Log in to post comments

Deep Neural Network-based speech recognition systems are widely used in most speech processing applications. To achieve better model robustness and accuracy, these networks are constructed with millions of parameters, making them storage and compute-intensive. In this paper, we propose Bin & Quant (B&Q), a compression technique using which we were able to reduce the Deep Speech 2 speech recognition model size by 7 times for a negligible loss in accuracy.

ICASSP 2020 slides.pptx

ICASSP 2020 slides.pptx (407)

Categories:: Resource constrained speech recognition (SPE-RCSR)

51 Views

CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION

Read more about CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION
Log in to post comments

We propose a method for zero-resource domain adaptation of DNN acoustic models, for use in low-resource situations where the only in-language training data available may be poorly matched to the intended target domain. Our method uses a multi-lingual model in which several DNN layers are shared between languages. This architecture enables domain adaptation transforms learned for one well-resourced language to be applied to an entirely different low- resource language.

ICASSP20_slides.pdf

ICASSP20_slides.pdf (417)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)
Resource constrained speech recognition (SPE-RCSR)

23 Views

Resource constrained speech recognition (SPE-RCSR)

Pages