Sorry, you need to enable JavaScript to visit this website.

We propose contextual language models that incorporate dialog level discourse information into language modeling. Previous works on contextual language model treat preceding utterances as a sequence of inputs, without considering dialog interactions. We design recurrent neural network (RNN) based contextual language models that specially track the interactions between speakers in a dialog. Experiment results on Switchboard Dialog Act Corpus show that the proposed model outperforms conventional single turn based RNN language model by 3.3% on perplexity.


Recurrent neural network (RNN) based character-level language models (CLMs) are extremely useful for modeling out-of-vocabulary words by nature. However, their performance is generally much worse than the word-level language models (WLMs), since CLMs need to consider longer history of tokens to properly predict the next one. We address this problem by proposing hierarchical RNN architectures, which consist of multiple modules with different timescales.


In automatic speech recognition (ASR), error correction after the initial search stage is a commonly used technique to improve performance. Whilst completely automatic error correction, such as full second pass rescoring using complex language models, is widely used, directed error correction, where the error locations are manually given, is of great interest in many scenarios. Previous works on directed error correction usually uses the error location information to change search space with original ASR models.


Spoken keyword search in low-resource condition suffers from out-of-vocabulary (OOV) problem and insufficient text data for language model (LM) training. Web-crawled text data is used to expand vocabulary and to augment language model. However, the mismatching between web text and the target speech data brings difficulties to effective utilization. New words from web data need an evaluation to exclude noisy words or introduce proper probabilities. In this paper, several criteria to rank new words from web data are investigated and are used as features


A simple but powerful language model called fixed-size
ordinally-forgetting encoding (FOFE) based feedforward neural
network language models (FNN-LMs) has been proposed recently.
Experimental results have shown that FOFE based FNNLMs
can outperform not only the standard FNN-LMs but also
the popular recurrent neural network language models (RNNLMs).
In this paper, we extend FOFE based FNN-LMs from
several aspects. Firstly, we have proposed a new method to
further improve the performance of FOFE based FNN-LMs by


In recent years, recurrent neural network language models (RNNLMs) have become increasingly popular for a range of applications including speech recognition. However, the training of RNNLMs is computationally expensive, which limits the quantity of data, and size of network, that can be used. In order to fully exploit the power of RNNLMs, efficient training implementations are required. This paper introduces an open-source toolkit, the CUED-RNNLM toolkit, which supports efficient GPU-based training of RNNLMs.


Most current language recognition systems model different levels of information such as acoustic, prosodic, phonotactic, etc. independently and combine the model likelihoods in order to make a decision. However, these are single level systems that treat all languages identically and hence incapable of exploiting any similarities that may exist within groups of languages.