Gaussian Process LSTM Recurrent Neural Network Language Models for Speech Recognition

Recurrent neural network language models (RNNLMs) have shown superior performance across a range of speech recognition tasks. At the heart of all RNNLMs, the activation functions play a vital role to control the information flows and tracking longer history contexts that are useful for predicting the following words. Long short-term memory (LSTM) units are well known for such ability and thus widely used in current RNNLMs. However, the deterministic parameter estimates in LSTM RNNLMs are prone to over-fitting and poor generalization when given limited training data. Furthermore, the precise forms of activations in LSTM have been largely empirically set for all cells at a global level. In order to address these issues, this paper introduces Gaussian process (GP) LSTM RNNLMs. In addition to modeling parameter uncertainty under a Bayesian framework, it also allows the optimal forms of gates being automatically learned for individual LSTM cells. Experiments were conducted on three tasks: the Penn Treebank (PTB) corpus, Switchboard conversational telephone speech (SWBD) and the AMI meeting room data. The proposed GP-LSTM RNNLMs consistently outperform the baseline LSTM RNNLMs in terms of both perplexity and word error rate.

IEEE Xplore link: https://ieeexplore.ieee.org/document/8683660

GPLSTM_ICASSP_Poster_v11.pdf

GPLSTM ICASSP Poster (756)

Thumbs Up

CITE

Documents

Poster

Gaussian Process LSTM Recurrent Neural Network Language Models for Speech Recognition

GPLSTM_ICASSP_Poster_v11.pdf

QUESTIONS?