Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

AN IMPROVED DEEP NEURAL NETWORK FOR MODELING SPEAKER CHARACTERISTICS AT DIFFERENT TEMPORAL SCALES

Abstract: 

This paper presents an improved deep embedding learning method based on a convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) a multiscale convolution (MSCNN) is adopted in the frame-level layers to capture the complementary speaker information in different receptive fields; (2) a Baum-Welch statistics attention (BWSA) mechanism is applied in the pooling layer, which can integrate more useful long-term speaker characteristics in the temporal pooling layer. Experiments are carried out on the NIST SRE16 evaluation set. The results demonstrate the effectiveness of the MSCNN and show that the proposed BWSA can further improve the performance of the DNN embedding system.

up
0 users have voted:

Paper Details

Authors:
Submitted On:
14 April 2020 - 6:25am
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
Bin Gu
Document Year:
2020
Cite

Document Files

ICASSP2020_poster_BinGu.pdf

(31)

Subscribe

[1] , "AN IMPROVED DEEP NEURAL NETWORK FOR MODELING SPEAKER CHARACTERISTICS AT DIFFERENT TEMPORAL SCALES", IEEE SigPort, 2020. [Online]. Available: http://sigport.org/5095. Accessed: Jul. 13, 2020.
@article{5095-20,
url = {http://sigport.org/5095},
author = { },
publisher = {IEEE SigPort},
title = {AN IMPROVED DEEP NEURAL NETWORK FOR MODELING SPEAKER CHARACTERISTICS AT DIFFERENT TEMPORAL SCALES},
year = {2020} }
TY - EJOUR
T1 - AN IMPROVED DEEP NEURAL NETWORK FOR MODELING SPEAKER CHARACTERISTICS AT DIFFERENT TEMPORAL SCALES
AU -
PY - 2020
PB - IEEE SigPort
UR - http://sigport.org/5095
ER -
. (2020). AN IMPROVED DEEP NEURAL NETWORK FOR MODELING SPEAKER CHARACTERISTICS AT DIFFERENT TEMPORAL SCALES. IEEE SigPort. http://sigport.org/5095
, 2020. AN IMPROVED DEEP NEURAL NETWORK FOR MODELING SPEAKER CHARACTERISTICS AT DIFFERENT TEMPORAL SCALES. Available at: http://sigport.org/5095.
. (2020). "AN IMPROVED DEEP NEURAL NETWORK FOR MODELING SPEAKER CHARACTERISTICS AT DIFFERENT TEMPORAL SCALES." Web.
1. . AN IMPROVED DEEP NEURAL NETWORK FOR MODELING SPEAKER CHARACTERISTICS AT DIFFERENT TEMPORAL SCALES [Internet]. IEEE SigPort; 2020. Available from : http://sigport.org/5095