Sorry, you need to enable JavaScript to visit this website.

Gaussian Process LSTM Recurrent Neural Network Language Models for Speech Recognition

Citation Author(s):
Max W. Y. Lam, Xie Chen, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng
Submitted by:
Max W. Y. Lam
Last updated:
7 May 2019 - 11:49pm
Document Type:
Document Year:
Max W. Y. Lam
Paper Code:


Recurrent neural network language models (RNNLMs) have shown superior performance across a range of speech recognition tasks. At the heart of all RNNLMs, the activation functions play a vital role to control the information flows and tracking longer history contexts that are useful for predicting the following words. Long short-term memory (LSTM) units are well known for such ability and thus widely used in current RNNLMs. However, the deterministic parameter estimates in LSTM RNNLMs are prone to over-fitting and poor generalization when given limited training data. Furthermore, the precise forms of activations in LSTM have been largely empirically set for all cells at a global level. In order to address these issues, this paper introduces Gaussian process (GP) LSTM RNNLMs. In addition to modeling parameter uncertainty under a Bayesian framework, it also allows the optimal forms of gates being automatically learned for individual LSTM cells. Experiments were conducted on three tasks: the Penn Treebank (PTB) corpus, Switchboard conversational telephone speech (SWBD) and the AMI meeting room data. The proposed GP-LSTM RNNLMs consistently outperform the baseline LSTM RNNLMs in terms of both perplexity and word error rate.

IEEE Xplore link:

1 user has voted: Max W. Y. Lam