- Read more about Sequence-to-Sequence ASR Optimization via Reinforcement Learning
- Log in to post comments
Despite the success of sequence-to-sequence approaches in automatic speech recognition (ASR) systems, the models still suffer from several problems, mainly due to the mismatch between the training and inference conditions. In the sequence-to-sequence architecture, the model is trained to predict the grapheme of the current time-step given the input of speech signal and the ground-truth grapheme history of the previous time-steps. However, it remains unclear how well the model approximates real-world speech during inference.
- Categories:

- Read more about End-to-End Multimodal Speech Recognition
- Log in to post comments
Transcription or sub-titling of open-domain videos is still a chal- lenging domain for Automatic Speech Recognition (ASR) due to the data’s challenging acoustics, variable signal processing and the essentially unrestricted domain of the data. In previous work, we have shown that the visual channel – specifically object and scene features – can help to adapt the acoustic model (AM) and language model (LM) of a recognizer, and we are now expanding this work to end-to-end approaches.
- Categories:
- Read more about AN INVESTIGATION INTO INSTANTANEOUS FREQUENCY ESTIMATION METHODS FOR IMPROVED SPEECH RECOGNITION FEATURES
- Log in to post comments
There have been several studies, in the recent past, pointing to the
importance of analytic phase of the speech signal in human percep-
tion, especially in noisy conditions. However, phase information is
still not used in state-of-the-art speech recognition systems. In this
paper, we illustrate the importance of analytic phase of the speech
signal for automatic speech recognition. As the computation of ana-
lytic phase suffers from inevitable phase wrapping problem, we ex-
tract features from its time derivative, referred to as instantaneous
- Categories:

- Read more about Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition
- Log in to post comments
poster-llc.pdf

- Categories:

- Read more about Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition
- Log in to post comments
poster-llc.pdf

- Categories:

- Read more about FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION
- Log in to post comments
Convolutional Restricted Boltzmann Machine (ConvRBM) as a model for speech signal is presented in this paper. We have
developed ConvRBM with sampling from noisy rectified linear units (NReLUs). ConvRBM is trained in an unsupervised way to model speech signal of arbitrary lengths. Weights of the model can represent an auditory-like filterbank. Our
poster.pdf

- Categories:

- Read more about FILTERBANK LEARNING USING CONVOLUTIONAL RESTRICTED BOLTZMANN MACHINE FOR SPEECH RECOGNITION
- Log in to post comments
Convolutional Restricted Boltzmann Machine (ConvRBM) as a model for speech signal is presented in this paper. We have
developed ConvRBM with sampling from noisy rectified linear units (NReLUs). ConvRBM is trained in an unsupervised way to model speech signal of arbitrary lengths. Weights of the model can represent an auditory-like filterbank. Our
poster.pdf

- Categories:
- Read more about Selection and Combination of Hypotheses for Dialectal Speech Recognition
- Log in to post comments
- Categories:
- Read more about Divergence estimation based on deep neural networks and its use for language identification
- Log in to post comments
In this paper, we propose a method to estimate statistical divergence between probability distributions by a DNN-based discriminative approach and its use for language identification tasks. Since statistical divergence is generally defined as a functional of two probability density functions, these density functions are usually represented in a parametric form. Then, if a mismatch exists between the assumed distribution and its true one, the obtained divergence becomes erroneous.
- Categories:
- Read more about ACCELERATING MULTI-USER LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION ON HETEROGENEOUS CPU-GPU PLATFORMS
- Log in to post comments
In our previous work, we developed a GPU-accelerated speech recognition engine optimized for faster than real time speech recognition on a heterogeneous CPU-GPU architecture. In this work, we focused on developing a scalable server-client architecture specifically optimized to simultaneously decode multiple users in real-time.
- Categories: