Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

A Better and Faster End-to-End Model for Streaming ASR

Read more about A Better and Faster End-to-End Model for Streaming ASR
Log in to post comments

TUE_SPE1-better_faster_e2e-slides.pdf

TUE_SPE1-better_faster_e2e-slides.pdf (464)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

18 Views

Speech Acoustic Modelling from Raw Phase Spectrum

Read more about Speech Acoustic Modelling from Raw Phase Spectrum
Log in to post comments

Magnitude spectrum-based features are the most widely employed front-ends for acoustic modelling in automatic speech recognition (ASR) systems. In this paper, we investigate the possibility and efficacy of acoustic modelling using the raw short-time phase spectrum. In particular, we study the usefulness of the raw wrapped, unwrapped and minimum-phase phase spectra as well as the phase of the source and filter components for acoustic modelling.

Poster_RawPhase.pdf

Poster (249)

Slides_RawPhase.pdf

Slides (268)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

33 Views

Self-Training for End-to-End Speech Recognition - ICASSP 2020 Slides

Read more about Self-Training for End-to-End Speech Recognition - ICASSP 2020 Slides
Log in to post comments

Self-Training for End-to-End Speech Recognition - ICASSP 2020.pdf

Self-Training for End-to-End Speech Recognition - ICASSP 2020.pdf (571)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

294 Views

CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION

Read more about CROSS LINGUAL TRANSFER LEARNING FOR ZERO-RESOURCE DOMAIN ADAPTATION
Log in to post comments

We propose a method for zero-resource domain adaptation of DNN acoustic models, for use in low-resource situations where the only in-language training data available may be poorly matched to the intended target domain. Our method uses a multi-lingual model in which several DNN layers are shared between languages. This architecture enables domain adaptation transforms learned for one well-resourced language to be applied to an entirely different low- resource language.

ICASSP20_slides.pdf

ICASSP20_slides.pdf (416)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)
Resource constrained speech recognition (SPE-RCSR)

23 Views

DEEP NEURAL NETWORKS BASED AUTOMATIC SPEECH RECOGNITION FOR FOUR ETHIOPIAN LANGUAGES

Read more about DEEP NEURAL NETWORKS BASED AUTOMATIC SPEECH RECOGNITION FOR FOUR ETHIOPIAN LANGUAGES
Log in to post comments

In this work, we present speech recognition systems for four Ethiopian languages: Amharic, Tigrigna, Oromo and Wolaytta. We have used comparable training corpora of about 20 to 29 hours speech and evaluation speech of about 1 hour for each of the languages. For Amharic and Tigrigna, lexical and language models of different vocabulary size have been developed. For Oromo and Wolaytta, the training lexicons have been used for decoding.

MarthaSolomonTanja.pdf

MarthaSolomonTanja.pdf (604)

Categories:: Human Spoken Language Acquisition, Development and Learning (SLP-LADL)
Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)
Neural network learning (MLR-NNLR)

151 Views

Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders

We present Mockingjay as a new speech representation learning approach, where bidirectional Transformer encoders are pre-trained on a large amount of unlabeled speech. Previous speech representation methods learn through conditioning on past frames and predicting information about future frames. Whereas Mockingjay is designed to predict the current frame through jointly conditioning on both past and future contexts.

mockingjay.pdf

Presentation Slides (389)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)
Audio Processing Systems

52 Views

TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR

Read more about TOWARDS FAST AND ACCURATE STREAMING END-TO-END ASR
Log in to post comments

TU1_L1_3-rnnt_ep.pdf

TU1_L1_3-rnnt_ep.pdf (432)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

22 Views

Robust End-To-End Keyword Spotting And Voice Command Recognition For Mobile Game

Read more about Robust End-To-End Keyword Spotting And Voice Command Recognition For Mobile Game
Log in to post comments

We present an effective method to solve a small-footprint keyword spotting (KWS) and voice command based user interface for mobile game. For KWS task, our goal is to design and implement a computationally very light deep neural network model into mobile device, in the same time to improve the accuracy in various noisy environments. We propose a simple yet effective convolutional neural network (CNN) with Google’s tensorflow-lite for android and Apple’s core ML for iOS deployment.