Sorry, you need to enable JavaScript to visit this website.

Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR

Citation Author(s):
Shaofei Xue, Zhijie Yan, Zhiying Huang, Lirong Dai
Submitted by:
Shaofei Xue
Last updated:
14 October 2016 - 12:31pm
Document Type:
Presentation Slides
Document Year:
2016
Event:
Presenters:
Shaofei Xue
Paper Code:
O11-3
 

Recently, several fast speaker adaptation methods have been proposed for the hybrid DNN-HMM models based on the so called discriminative speaker codes (SC) and applied to unsupervised speaker adaptation in speech recognition. It has been demonstrated that the SC based methods are quite effective in adapting DNNs even when only a very small amount of adaptation data is available. However, in this way we have to estimate speaker code for new speakers by an updating process and obtain the final results through two-pass decoding. In this paper, we propose an alternative d-code extraction method to replace SC based on modeling speaker information with BLSTMRNN which makes one-pass decoding possible. After that, a speaker clustering approach is introduced to decrease the target number of speaker-BLSTM which accelerates training speed and improves ASR performance at the same time. Meanwhile, an interpolation method is provided for taking use of d-codes from training set to improve the recognition accuracy especially when adaptation data is limited. Experimental results on Switchboard task have shown that the proposed methods lead to a comparable relative reduction in WER (about 9%) as the standard SC based adaptation method without the need of two-pass decoding.

up
0 users have voted: