Sorry, you need to enable JavaScript to visit this website.

Speech Adaptation/Normalization (SPE-ADAP)

Domain and speaker adaptation for Cortana Speech Recognition


Voice assistant represents one of the most popular and important scenarios for speech recognition. In this paper, we propose two adaptation approaches to customize a multi-style well-trained acoustic model towards its subsidiary domain of Cortana assistant. First, we present anchor-based speaker adaptation by extracting the speaker information, i-vector or d-vector embeddings, from the anchor segments of `Hey Cortana'. The anchor embeddings are mapped to layer-wise parameters to control the transformations of both weight matrices and biases of multiple layers.

Paper Details

Authors:
Shixiong Zhang and Yifan Gong
Submitted On:
12 April 2018 - 7:35pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

ICASSP2018_AnchorAdapt_poster.pdf

(95 downloads)

Subscribe

[1] Shixiong Zhang and Yifan Gong, "Domain and speaker adaptation for Cortana Speech Recognition", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2518. Accessed: Dec. 12, 2018.
@article{2518-18,
url = {http://sigport.org/2518},
author = {Shixiong Zhang and Yifan Gong },
publisher = {IEEE SigPort},
title = {Domain and speaker adaptation for Cortana Speech Recognition},
year = {2018} }
TY - EJOUR
T1 - Domain and speaker adaptation for Cortana Speech Recognition
AU - Shixiong Zhang and Yifan Gong
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2518
ER -
Shixiong Zhang and Yifan Gong. (2018). Domain and speaker adaptation for Cortana Speech Recognition. IEEE SigPort. http://sigport.org/2518
Shixiong Zhang and Yifan Gong, 2018. Domain and speaker adaptation for Cortana Speech Recognition. Available at: http://sigport.org/2518.
Shixiong Zhang and Yifan Gong. (2018). "Domain and speaker adaptation for Cortana Speech Recognition." Web.
1. Shixiong Zhang and Yifan Gong. Domain and speaker adaptation for Cortana Speech Recognition [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2518

Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR


Recently, several fast speaker adaptation methods have been proposed for the hybrid DNN-HMM models based on the so called discriminative speaker codes (SC) and applied to unsupervised speaker adaptation in speech recognition. It has been demonstrated that the SC based methods are quite effective in adapting DNNs even when only a very small amount of adaptation data is available. However, in this way we have to estimate speaker code for new speakers by an updating process and obtain the final results through two-pass decoding.

Paper Details

Authors:
Shaofei Xue, Zhijie Yan, Zhiying Huang, Lirong Dai
Submitted On:
14 October 2016 - 12:31pm
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR.pdf

(258 downloads)

Subscribe

[1] Shaofei Xue, Zhijie Yan, Zhiying Huang, Lirong Dai, "Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1208. Accessed: Dec. 12, 2018.
@article{1208-16,
url = {http://sigport.org/1208},
author = {Shaofei Xue; Zhijie Yan; Zhiying Huang; Lirong Dai },
publisher = {IEEE SigPort},
title = {Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR},
year = {2016} }
TY - EJOUR
T1 - Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR
AU - Shaofei Xue; Zhijie Yan; Zhiying Huang; Lirong Dai
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1208
ER -
Shaofei Xue, Zhijie Yan, Zhiying Huang, Lirong Dai. (2016). Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR. IEEE SigPort. http://sigport.org/1208
Shaofei Xue, Zhijie Yan, Zhiying Huang, Lirong Dai, 2016. Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR. Available at: http://sigport.org/1208.
Shaofei Xue, Zhijie Yan, Zhiying Huang, Lirong Dai. (2016). "Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR." Web.
1. Shaofei Xue, Zhijie Yan, Zhiying Huang, Lirong Dai. Rapid Speaker Adaptation Based on D-code Extracted from BLSTM-RNN in LVCSR [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1208

Speaker adaptive training in deep neural networks using speaker dependent bottleneck features

Paper Details

Authors:
Rama Doddipatla
Submitted On:
29 March 2016 - 8:36am
Short Link:
Type:
Event:
Presenter's Name:
Paper Code:
Document Year:
Cite

Document Files

poster_rama_tosh.pdf

(377 downloads)

Subscribe

[1] Rama Doddipatla, "Speaker adaptive training in deep neural networks using speaker dependent bottleneck features", IEEE SigPort, 2016. [Online]. Available: http://sigport.org/1066. Accessed: Dec. 12, 2018.
@article{1066-16,
url = {http://sigport.org/1066},
author = {Rama Doddipatla },
publisher = {IEEE SigPort},
title = {Speaker adaptive training in deep neural networks using speaker dependent bottleneck features},
year = {2016} }
TY - EJOUR
T1 - Speaker adaptive training in deep neural networks using speaker dependent bottleneck features
AU - Rama Doddipatla
PY - 2016
PB - IEEE SigPort
UR - http://sigport.org/1066
ER -
Rama Doddipatla. (2016). Speaker adaptive training in deep neural networks using speaker dependent bottleneck features. IEEE SigPort. http://sigport.org/1066
Rama Doddipatla, 2016. Speaker adaptive training in deep neural networks using speaker dependent bottleneck features. Available at: http://sigport.org/1066.
Rama Doddipatla. (2016). "Speaker adaptive training in deep neural networks using speaker dependent bottleneck features." Web.
1. Rama Doddipatla. Speaker adaptive training in deep neural networks using speaker dependent bottleneck features [Internet]. IEEE SigPort; 2016. Available from : http://sigport.org/1066