Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR

Recently, several fast speaker adaptation methods have been proposed for the hybrid DNN-HMM models based on the so called discriminative speaker codes (SC) and applied to unsupervised speaker adaptation in speech recognition. It has been demonstrated that the SC based methods are quite effective in adapting DNNs even when only a very small amount of adaptation data is available. However, in this way we have to estimate speaker code for new speakers by an updating process and obtain the final results through two-pass decoding. In this paper, we propose an alternative d-code extraction method to replace SC based on modeling speaker information with BLSTMRNN which makes one-pass decoding possible. After that, a speaker clustering approach is introduced to decrease the target number of speaker-BLSTM which accelerates training speed and improves ASR performance at the same time. Meanwhile, an interpolation method is provided for taking use of d-codes from training set to improve the recognition accuracy especially when adaptation data is limited. Experimental results on Switchboard task have shown that the proposed methods lead to a comparable relative reduction in WER (about 9%) as the standard SC based adaptation method without the need of two-pass decoding.

Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR.pdf

Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR.pdf (72)

Thumbs Up

CITE

Documents

Presentation Slides

Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR

Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR.pdf

QUESTIONS?