Directed Automatic Speech Transcription Error Correction Using Bidirectional LSTM

In automatic speech recognition (ASR), error correction after the initial search stage is a commonly used technique to improve performance. Whilst completely automatic error correction, such as full second pass rescoring using complex language models, is widely used, directed error correction, where the error locations are manually given, is of great interest in many scenarios. Previous works on directed error correction usually uses the error location information to change search space with original ASR models. In this paper, a novel deep learning based score combination approach is proposed for directed error correction. Here, a bi-directional LSTM (BLSTM) language model is trained to estimate unnormalized sentence completion scores. These completion scores are then combined with the confusion network scores from the initial search stage for hypothesis rescoring. Experiments showed that the BLSTM based language model achieved better results not only than simpler models such as bi-directional n-gram or LSTM, but also better than human prediction. In a real world Chinese ASR task, it was also shown that the proposed approach significantly outperformed the approach of choosing the second best hypothesis in the error sausages of confusion networks.

poster.pdf

poster.pdf (1611)

Thumbs Up

CITE

Documents

Poster

Directed Automatic Speech Transcription Error Correction Using Bidirectional LSTM

poster.pdf

QUESTIONS?