Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

Statistics Pooling Time Delay Neural Network Based on X-vector for Speaker Verification

Abstract: 

This paper aims to improve speaker embedding representation based on x-vector for extracting more detailed information for speaker verification. We propose a statistics pooling time delay neural network (TDNN), in which the TDNN structure integrates statistics pooling for each layer, to consider the variation of temporal context in frame-level transformation. The proposed feature vector, named as stats-vector, are compared with the baseline x-vector features on the VoxCeleb dataset and the Speakers in the Wild (SITW) dataset for speaker verification. The experimental results showed that the proposed stats-vector with score fusion achieved the best performance on VoxCeleb1 dataset. Furthermore, considering the interference from other speakers in the recordings, we found that the proposed stats-vector efficiently reduced the interference and improved the speaker verification performance on the SITW dataset.

up
0 users have voted:

Paper Details

Authors:
Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang
Submitted On:
15 May 2020 - 11:55pm
Short Link:
Type:
Presentation Slides
Event:

Document Files

20200419_ICASSP_Experiment 1.pdf

(26)

Subscribe

[1] Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang, "Statistics Pooling Time Delay Neural Network Based on X-vector for Speaker Verification", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5365. Accessed: Jul. 13, 2020.
@article{5365-20,
url = {http://sigport.org/5365},
author = {Qian-Bei Hong; Chung-Hsien Wu; Hsin-Min Wang; Chien-Lin Huang },
publisher = {IEEE SigPort},
title = {Statistics Pooling Time Delay Neural Network Based on X-vector for Speaker Verification},
year = {2020} }
TY - EJOUR
T1 - Statistics Pooling Time Delay Neural Network Based on X-vector for Speaker Verification
AU - Qian-Bei Hong; Chung-Hsien Wu; Hsin-Min Wang; Chien-Lin Huang
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5365
ER -
Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang. (2020). Statistics Pooling Time Delay Neural Network Based on X-vector for Speaker Verification. IEEE SigPort. http://sigport.org/5365
Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang, 2020. Statistics Pooling Time Delay Neural Network Based on X-vector for Speaker Verification. Available at: http://sigport.org/5365.
Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang. (2020). "Statistics Pooling Time Delay Neural Network Based on X-vector for Speaker Verification." Web.
1. Qian-Bei Hong, Chung-Hsien Wu, Hsin-Min Wang, Chien-Lin Huang. Statistics Pooling Time Delay Neural Network Based on X-vector for Speaker Verification [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5365