Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Deep Speaker Embedding Learning with Multi-Level Pooling for Text-Independent Speaker Verification

Abstract: 

This paper aims to improve the widely used deep speaker embedding x-vector model. We propose the following improvements: (1) a hybrid neural network structure using both time delay neural network (TDNN) and long short-term memory neural networks (LSTM) to generate complementary speaker information at different levels; (2) a multi-level pooling strategy to collect speaker information from both TDNN and LSTM layers; (3) a regularization scheme on the speaker embedding extraction layer to make the extracted embeddings suitable for the following fusion step. The synergy of these improvements are shown on the NIST SRE 2016 eval test (with a 19 EER reduction) and SRE 2018 dev test (with a 9 EER reduction), as well as more than 10 DCF scores reduction on these two test sets over the x-vector baseline.

up
0 users have voted:

Paper Details

Authors:
Yun Tang, Guohong Ding, Jing Huang, Xiaodong He, Bowen Zhou
Submitted On:
8 May 2019 - 2:09pm
Short Link:
Type:
Poster
Event:
Presenter's Name:
Xiaodong He
Paper Code:
3807
Document Year:
2019
Cite

Document Files

ICASSP2019_poster.pdf

(49)

Subscribe

[1] Yun Tang, Guohong Ding, Jing Huang, Xiaodong He, Bowen Zhou, "Deep Speaker Embedding Learning with Multi-Level Pooling for Text-Independent Speaker Verification", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4126. Accessed: Nov. 11, 2019.
@article{4126-19,
url = {http://sigport.org/4126},
author = {Yun Tang; Guohong Ding; Jing Huang; Xiaodong He; Bowen Zhou },
publisher = {IEEE SigPort},
title = {Deep Speaker Embedding Learning with Multi-Level Pooling for Text-Independent Speaker Verification},
year = {2019} }
TY - EJOUR
T1 - Deep Speaker Embedding Learning with Multi-Level Pooling for Text-Independent Speaker Verification
AU - Yun Tang; Guohong Ding; Jing Huang; Xiaodong He; Bowen Zhou
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4126
ER -
Yun Tang, Guohong Ding, Jing Huang, Xiaodong He, Bowen Zhou. (2019). Deep Speaker Embedding Learning with Multi-Level Pooling for Text-Independent Speaker Verification. IEEE SigPort. http://sigport.org/4126
Yun Tang, Guohong Ding, Jing Huang, Xiaodong He, Bowen Zhou, 2019. Deep Speaker Embedding Learning with Multi-Level Pooling for Text-Independent Speaker Verification. Available at: http://sigport.org/4126.
Yun Tang, Guohong Ding, Jing Huang, Xiaodong He, Bowen Zhou. (2019). "Deep Speaker Embedding Learning with Multi-Level Pooling for Text-Independent Speaker Verification." Web.
1. Yun Tang, Guohong Ding, Jing Huang, Xiaodong He, Bowen Zhou. Deep Speaker Embedding Learning with Multi-Level Pooling for Text-Independent Speaker Verification [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4126