Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

Attention-based End-to-end Speech Recognition on Voice Search

Read more about Attention-based End-to-end Speech Recognition on Voice Search
Log in to post comments

SP-L1.4.pdf

SP-L1.4.pdf (643)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

12 Views

Improved TDNNs using Deep Kernels and Frequency Dependent Grid-RNNs

Read more about Improved TDNNs using Deep Kernels and Frequency Dependent Grid-RNNs
Log in to post comments

Time delay neural networks (TDNNs) are an effective acoustic model for large vocabulary speech recognition. The strength of the model can be attributed to its ability to effectively model long temporal contexts. However, current TDNN models are relatively shallow, which limits the modelling capability. This paper proposes a method of increasing the network depth by deepening the kernel used in the TDNN temporal convolutions. The best performing kernel consists of three fully connected layers with a residual (ResNet) connection from the output of the first to the output of the third.

tdnn_lecture_4.pdf

tdnn_lecture_4.pdf (485)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

10 Views

Sequence-to-Sequence ASR Optimization via Reinforcement Learning

Read more about Sequence-to-Sequence ASR Optimization via Reinforcement Learning
Log in to post comments

Despite the success of sequence-to-sequence approaches in automatic speech recognition (ASR) systems, the models still suffer from several problems, mainly due to the mismatch between the training and inference conditions. In the sequence-to-sequence architecture, the model is trained to predict the grapheme of the current time-step given the input of speech signal and the ground-truth grapheme history of the previous time-steps. However, it remains unclear how well the model approximates real-world speech during inference.

POSTER_ICASSP_V3.pdf

Poster in PDF format (492)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)
General Topics in Speech Recognition (SPE-GASR)

13 Views

ADVANCING CONNECTIONIST TEMPORAL CLASSIFICATION WITH ATTENTION MODELING

Read more about ADVANCING CONNECTIONIST TEMPORAL CLASSIFICATION WITH ATTENTION MODELING
Log in to post comments

In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework. In particular, we derive new context vectors using time convolution features to model attention as part of the CTC network. To further improve attention modeling, we utilize content information extracted from a network representing an implicit language model. Finally, we introduce vector based attention weights that are applied on context vectors across both time and their individual components.

ctc_attention_slides.pdf

ctc_attention_slides.pdf (501)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

30 Views

REINFORCEMENT LEARNING OF SPEECH RECOGNITION SYSTEM BASED ON POLICY GRADIENT AND HYPOTHESIS SELECTION

ICASSP2018_poster_tkato.pdf

ICASSP2018_poster_tkato.pdf (609)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

18 Views

SEMI-SUPERVISED TRAINING OF ACOUSTIC MODELS USING LATTICE-FREE MMI

Read more about SEMI-SUPERVISED TRAINING OF ACOUSTIC MODELS USING LATTICE-FREE MMI
Log in to post comments

The lattice-free MMI objective (LF-MMI) has been used in supervised training of
state-of-the-art neural network acoustic models for automatic speech
recognition (ASR). With large amounts of unsupervised data available,
extending this approach to the semi-supervised scenario is of significance.
Finite-state transducer (FST) based supervision used with LF-MMI provides a
natural way to incorporate uncertainties when dealing with unsupervised data.
In this paper,
we describe various extensions to standard LF-MMI training to allow the use

talk.pdf

talk.pdf (453)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

57 Views

Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training

Read more about Boosting Noise Robustness of Acoustic Model via Deep Adversarial Training
Log in to post comments

liubin-ICASSP2018.pptx

liubin-ICASSP2018.pptx (432)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

9 Views

Advancing Acoustic-to-Word CTC Model

Read more about Advancing Acoustic-to-Word CTC Model
Log in to post comments

The acoustic-to-word model based on the connectionist temporal classification (CTC) criterion was shown as a natural end-to-end (E2E) model directly targeting words as output units. However, the word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node. Hence, such a word-based CTC model can only recognize the frequent words modeled by the network output nodes.

AdvanceCTC_poster.pdf

AdvanceCTC_poster.pdf (513)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

8 Views

DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING

Read more about DEVELOPING FAR-FIELD SPEAKER SYSTEM VIA TEACHER-STUDENT LEARNING
Log in to post comments

In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system. Specifically, we use teacher-student (T/S) learning to adapt a close-talk well-trained production AM to far-field by using parallel close-talk and simulated far-field data. We also use T/S learning to compress a large-size KWS model into a small-size one to fit the device computational cost. Without the need of transcription, T/S learning well utilizes untranscribed data to boost the model performance in both the AM adaptation and KWS model compression.

speaker_poster.pdf

speaker_poster.pdf (440)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

6 Views

Domain Adversarial Training for Accented Speech Recgnition

Read more about Domain Adversarial Training for Accented Speech Recgnition
Log in to post comments

In this paper, we propose a domain adversarial training (DAT) algorithm to alleviate the accented speech recognition problem. In order to reduce the mismatch between labeled source domain data (“standard” accent) and unlabeled target domain data (with heavy accents), we augment the learning objective for a Kaldi TDNN network with a domain adversarial training (DAT) objective to encourage the model to learn accent-invariant features.

icassp_slides_snsun_v6.pdf

icassp_slides_snsun_v6.pdf (737)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

12 Views

Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

Pages