Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

Transducer-Based Streaming Deliberation For Cascaded Encoders

Read more about Transducer-Based Streaming Deliberation For Cascaded Encoders
Log in to post comments

Previous research on applying deliberation networks to automatic speech recognition has achieved excellent results. The attention decoder based deliberation model often works as a rescorer to improve first-pass recognition results, and requires the full first-pass hypothesis for second-pass deliberation. In this work, we propose a transducer-based streaming deliberation model. The joint network of a transducer decoder often receives inputs from the encoder and the prediction network. We propose to use attention to the first-pass text hypothesis as the third input to the joint network.

ICASSP'22 transducer deliberation poster.pdf

ICASSP'22 transducer deliberation poster.pdf (249)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)
Robust Speech Recognition (SPE-ROBU)

20 Views

Using Synthetic Audio to Improve the Recognition of Out-of-vocabulary Words in End-to-end ASR Systems

ICASSP2021_PPT_final_v1.pptx

ICASSP 2021 presentation slides (347)

Poster_final.pdf

ICASSP 2021 presentation poster (909)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

16 Views

Windowed Attention Mechanisms for Speech Recognition

Read more about Windowed Attention Mechanisms for Speech Recognition
Log in to post comments

Windowed Attention Mechanisms for Speech Recognitiong_poster.pdf

Windowed Attention Mechanisms for Speech Recognitiong_poster.pdf (451)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

35 Views

Knowledge Distillation Using Output Errors for Self-Attention ASR Models

Read more about Knowledge Distillation Using Output Errors for Self-Attention ASR Models
Log in to post comments

Most automatic speech recognition (ASR) neural network models are not suitable for mobile devices due to their large model sizes. Therefore, it is required to reduce the model size to meet the limited hardware resources. In this study, we investigate sequence-level knowledge distillation techniques of self-attention ASR models for model compression.

icassp-2019-poster_v1.1.pptx

icassp-2019-poster_v1.1.pptx (617)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)
Resource constrained speech recognition (SPE-RCSR)

125 Views

A NEURAL NETWORK BASED RANKING FRAMEWORK TO IMPROVE ASR WITH NLU RELATED KNOWLEDGE DEPLOYED

This work proposes a new neural network framework to simultaneously rank multiple hypotheses generated by one or more automatic speech recognition (ASR) engines for a speech utterance. Features fed in the framework not only include those calculated from the ASR information, but also involve natural language understanding (NLU) related features, such as trigger features capturing long-distance constraints between word/slot pairs and BLSTM features representing intent-sensitive sentence embedding.

Poster_ICASSP2019.pdf

Poster_ICASSP2019.pdf (599)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

84 Views

Word Characters and Phone Pronunciation Embedding for ASR Confidence Classifier

Read more about Word Characters and Phone Pronunciation Embedding for ASR Confidence Classifier
Log in to post comments

Confidences are integral to ASR systems, and applied to data selection, adaptation, ranking hypotheses, arbitration etc.Hybrid ASR system is inherently a match between pronunciations and AM+LM evidence but current confidence features lack pronunciation information. We develop pronunciation embeddings to represent and factorize acoustic score in relevant bases, and demonstrate 8-10% relative reduction in false alarm (FA) on large scale tasks. We generalize to standard NLP embeddings like Glove, and show 16% relative reduction in FA in combination with Glove.

WordEmbed_v5.pdf

WordEmbed_v5.pdf (478)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)
Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

37 Views

Can DNNs Learn to Lipread Full Sentences ?

Read more about Can DNNs Learn to Lipread Full Sentences ?
Log in to post comments

Finding visual features and suitable models for lipreading tasks that are more complex than a well-constrained vocabulary has proven challenging. This paper explores state-of-the-art Deep Neural Network architectures for lipreading based on a Sequence to Sequence Recurrent Neural Network. We report results for both hand-crafted and 2D/3D Convolutional Neural Network visual front-ends, online monotonic attention, and a joint Connectionist Temporal Classification-Sequence-to-Sequence loss.

slides.pdf

slides.pdf (467)

Categories:: Multimodal signal processing
Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

22 Views

ACCELERATING RECURRENT NEURAL NETWORK LANGUAGE MODEL BASED ONLINE SPEECH RECOGNITION SYSTEM

This paper presents methods to accelerate recurrent neural network based language models (RNNLMs) for online speech recognition systems.
Firstly, a lossy compression of the past hidden layer outputs (history vector) with caching is introduced in order to reduce the number of LM queries.
Next, RNNLM computations are deployed in a CPU-GPU hybrid manner, which computes each layer of the model on a more advantageous platform.
The added overhead by data exchanges between CPU and GPU is compensated through a frame-wise batching strategy.

Icassp2018_KML_20180402_poster.pdf

Icassp2018_KML_20180402_poster.pdf (599)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

13 Views

OUT-OF-VOCABULARY WORD RECOVERY USING FST-BASED SUBWORD UNIT CLUSTERING IN A HYBRID ASR SYSTEM - poster for ICASSP 2018

The paper presents a new approach to extracting useful information from out-of-vocabulary (OOV) speech regions in ASR system output. The system makes use of a hybrid decoding network with both words and sub-word units. In the decoded lattices, candidates for OOV regions are identified

Egorova_poster (1).pdf

Egorova_poster (1).pdf (598)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

34 Views

A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition

Read more about A Pruned RNNLM Lattice-Rescoring Algorithm for Automatic Speech Recognition
Log in to post comments

Lattice-rescoring is a common approach to take advantage of recurrent neural language models in ASR, where a wordlattice is generated from 1st-pass decoding and the lattice is then rescored with a neural model, and an n-gram approximation method is usually adopted to limit the search space. In this work, we describe a pruned lattice-rescoring algorithm for ASR, improving the n-gram approximation method. The pruned algorithm further limits the search space and uses heuristic search to pick better histories when expanding the lattice.