Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

SIMULTANEOUS SPEECH RECOGNITION AND ACOUSTIC EVENT DETECTION USING AN LSTM-CTC ACOUSTIC MODEL AND A WFST DECODER

ICASSP2018_simultaneous__poster_final.pdf

ICASSP2018_simultaneous__poster_final.pdf (718)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

66 Views

A Study of All-Convolutional Encoders for Connectionist Temporal Classification (Poster)

Connectionist temporal classification (CTC) is a popular sequence prediction approach for automatic speech recognition that is typically used with models based on recurrent neural networks (RNNs). We explore whether deep convolutional neural networks (CNNs) can be used effectively instead of RNNs as the "encoder" in CTC. CNNs lack an explicit representation of the entire sequence, but have the advantage that they are much faster to train. We present an exploration of CNNs as encoders for CTC models, in the context of character-based (lexicon-free) automatic speech recognition.

study-convolutional-encoders.pdf

study-convolutional-encoders.pdf (452)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

10 Views

MODELING NON-LINGUISTIC CONTEXTUAL SIGNALS IN LSTM LANGUAGE MODELS VIA DOMAIN ADAPTATION

When it comes to speech recognition for voice search, it would be
advantageous to take into account application information associated
with speech queries. However, in practice, the vast majority
of queries typically lack such annotations, posing a challenge to
train domain-specific language models (LMs). To obtain robust domain
LMs, typically a LM which has been pre-trained on general
data will be adapted to specific domains. We propose four adaptation
schemes to improve the domain performance of long shortterm

domain.pdf

domain.pdf (453)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

93 Views

On the use of grapheme models for searching in large spoken archives

Read more about On the use of grapheme models for searching in large spoken archives
Log in to post comments

This paper explores the possibility to use grapheme-based word and sub-word models in the task of spoken term detection (STD). The usage of grapheme models eliminates the need for expert-prepared pronunciation lexicons (which are often far from complete) and/or trainable grapheme-to-phoneme (G2P) algorithms that are frequently rather inaccurate, especially for rare words (words coming from a~different language). Moreover, the G2P conversion of the search terms that need to be performed on-line can substantially increase the response time of the STD system.

poster.pdf

poster.pdf (505)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

75 Views

Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition

This paper establishs CTC-based systems on Chinese Mandarin ASR task, three different level output units are explored: characters, context independent phonemes and context dependent phoneme. To make training stable we propose Newbob-Trn strategy, furthermore, blank label prior cost is proposed to improve the performance. Further, we establish the CTC-trained UniLSTM-RC model, which ensures the real-time requirement of an online system, meanwhile, brings performance gain on Chinese Mandarin ASR task.

Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition.pptx

Applying Connectionist Temporal Classification Objective Function to Chinese Mandarin Speech Recognition.pptx (56)

Categories:: Audio and Acoustic Signal Processing
Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

19 Views

End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin

Traditional hybrid DNN-HMM based ASR system for keywords spotting which models HMM states are not flexible to optimize for a specific language. In this paper, we construct an end-to-end acoustic model based ASR for keywords spotting in Mandarin. This model is constructed by LSTM-RNN and trained with objective measure of connectionist temporal classification. The input of the network is feature sequences, and the output the probabilities of the initials and finals of Mandarin syllables.

Ye Bai Poster.pdf

Ye Bai Poster.pdf (68)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

31 Views

Parallelizing WFST Speech Decoders (Poster)

Read more about Parallelizing WFST Speech Decoders (Poster)
Log in to post comments

icassp_poster.pptx

icassp_poster.pptx (483)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

11 Views

System Combination with Log-linear Models

Read more about System Combination with Log-linear Models
Log in to post comments

Improved speech recognition performance can often be obtained by combining multiple systems
together. Joint decoding, where scores from multiple systems are combined during decoding rather
than combining hypotheses, is one efficient approach for system combination. In standard joint
decoding the frame log-likelihoods from each system are used as the scores. These scores are then
weighted and summed to yield the final score for a frame. The system combination weights for this

poster.pdf

poster.pdf (803)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

13 Views

Investigating techniques for low resource conversational speech recognition

Read more about Investigating techniques for low resource conversational speech recognition
Log in to post comments

swahili2.pdf

swahili2.pdf (640)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

11 Views

Xerox Conversational AI Agent (XCAI) for Enterprise Knowledgebase Q&A

Read more about Xerox Conversational AI Agent (XCAI) for Enterprise Knowledgebase Q&A
Log in to post comments

In the past 5 years significant advances in Large Vocabulary Speech Recognition (LVSR), Deep Learning (DL) and Spoken Language Understanding (SLU), along with the explosive growth of wireless network bandwidth have given rise to three compelling Conversational AI agents that are available on the Andriod, iOS and Microsoft Smartphones. Conversational AI agents such as Google Now, Apple Siri and Microsoft Cortana are now the most preferred way of mobile web search and to perform command and control of the various smartphone apps.

Xerox XCAI ICASSP 2016.pdf

Xerox XCAI ICASSP 2016.pdf (80)

Categories:: Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

120 Views

Large Vocabulary Continuous Recognition/Search (SPE-LVCR)

Pages