Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Domain and speaker adaptation for Cortana Speech Recognition

Abstract: 

Voice assistant represents one of the most popular and important scenarios for speech recognition. In this paper, we propose two adaptation approaches to customize a multi-style well-trained acoustic model towards its subsidiary domain of Cortana assistant. First, we present anchor-based speaker adaptation by extracting the speaker information, i-vector or d-vector embeddings, from the anchor segments of `Hey Cortana'. The anchor embeddings are mapped to layer-wise parameters to control the transformations of both weight matrices and biases of multiple layers. Second, we directly update the existing model parameters for domain adaptation. We demonstrate that prior distribution should be updated along with the network adaptation to compensate the label bias from the development data. Updating the priors may have a significant impact when the target domain features high occurrence of anchor words. Experiments on Hey Cortana desktop test set show that both approaches improve the recognition accuracy significantly. The anchor-based adaptation using the anchor d-vector and the prior interpolation achieves 32% relative reduction in WER over the generic model.

up
0 users have voted:

Paper Details

Authors:
Shixiong Zhang and Yifan Gong
Submitted On:
12 April 2018 - 7:35pm
Short Link:
Type:
Poster
Event:
Presenter's Name:
Yong Zhao
Paper Code:
SP-P20.9
Document Year:
2018
Cite

Document Files

ICASSP2018_AnchorAdapt_poster.pdf

(76 downloads)

Subscribe

[1] Shixiong Zhang and Yifan Gong, "Domain and speaker adaptation for Cortana Speech Recognition", IEEE SigPort, 2018. [Online]. Available: http://sigport.org/2518. Accessed: Oct. 17, 2018.
@article{2518-18,
url = {http://sigport.org/2518},
author = {Shixiong Zhang and Yifan Gong },
publisher = {IEEE SigPort},
title = {Domain and speaker adaptation for Cortana Speech Recognition},
year = {2018} }
TY - EJOUR
T1 - Domain and speaker adaptation for Cortana Speech Recognition
AU - Shixiong Zhang and Yifan Gong
PY - 2018
PB - IEEE SigPort
UR - http://sigport.org/2518
ER -
Shixiong Zhang and Yifan Gong. (2018). Domain and speaker adaptation for Cortana Speech Recognition. IEEE SigPort. http://sigport.org/2518
Shixiong Zhang and Yifan Gong, 2018. Domain and speaker adaptation for Cortana Speech Recognition. Available at: http://sigport.org/2518.
Shixiong Zhang and Yifan Gong. (2018). "Domain and speaker adaptation for Cortana Speech Recognition." Web.
1. Shixiong Zhang and Yifan Gong. Domain and speaker adaptation for Cortana Speech Recognition [Internet]. IEEE SigPort; 2018. Available from : http://sigport.org/2518