Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Gaussian Process LSTM Recurrent Neural Network Language Models for Speech Recognition

Abstract: 

Recurrent neural network language models (RNNLMs) have shown superior performance across a range of speech recognition tasks. At the heart of all RNNLMs, the activation functions play a vital role to control the information flows and tracking longer history contexts that are useful for predicting the following words. Long short-term memory (LSTM) units are well known for such ability and thus widely used in current RNNLMs. However, the deterministic parameter estimates in LSTM RNNLMs are prone to over-fitting and poor generalization when given limited training data. Furthermore, the precise forms of activations in LSTM have been largely empirically set for all cells at a global level. In order to address these issues, this paper introduces Gaussian process (GP) LSTM RNNLMs. In addition to modeling parameter uncertainty under a Bayesian framework, it also allows the optimal forms of gates being automatically learned for individual LSTM cells. Experiments were conducted on three tasks: the Penn Treebank (PTB) corpus, Switchboard conversational telephone speech (SWBD) and the AMI meeting room data. The proposed GP-LSTM RNNLMs consistently outperform the baseline LSTM RNNLMs in terms of both perplexity and word error rate.

IEEE Xplore link: https://ieeexplore.ieee.org/document/8683660

up
1 user has voted: Max W. Y. Lam

Paper Details

Authors:
Max W. Y. Lam, Xie Chen, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng
Submitted On:
7 May 2019 - 11:49pm
Short Link:
Type:
Poster
Event:
Presenter's Name:
Max W. Y. Lam
Paper Code:
1296
Document Year:
2019
Cite

Document Files

GPLSTM ICASSP Poster

(351)

Subscribe

[1] Max W. Y. Lam, Xie Chen, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng, "Gaussian Process LSTM Recurrent Neural Network Language Models for Speech Recognition", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4002. Accessed: May. 23, 2019.
@article{4002-19,
url = {http://sigport.org/4002},
author = {Max W. Y. Lam; Xie Chen; Shoukang Hu; Jianwei Yu; Xunying Liu; Helen Meng },
publisher = {IEEE SigPort},
title = {Gaussian Process LSTM Recurrent Neural Network Language Models for Speech Recognition},
year = {2019} }
TY - EJOUR
T1 - Gaussian Process LSTM Recurrent Neural Network Language Models for Speech Recognition
AU - Max W. Y. Lam; Xie Chen; Shoukang Hu; Jianwei Yu; Xunying Liu; Helen Meng
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4002
ER -
Max W. Y. Lam, Xie Chen, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng. (2019). Gaussian Process LSTM Recurrent Neural Network Language Models for Speech Recognition. IEEE SigPort. http://sigport.org/4002
Max W. Y. Lam, Xie Chen, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng, 2019. Gaussian Process LSTM Recurrent Neural Network Language Models for Speech Recognition. Available at: http://sigport.org/4002.
Max W. Y. Lam, Xie Chen, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng. (2019). "Gaussian Process LSTM Recurrent Neural Network Language Models for Speech Recognition." Web.
1. Max W. Y. Lam, Xie Chen, Shoukang Hu, Jianwei Yu, Xunying Liu, Helen Meng. Gaussian Process LSTM Recurrent Neural Network Language Models for Speech Recognition [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4002