Sorry, you need to enable JavaScript to visit this website.

Adversarial Speaker Verification

Citation Author(s):
Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong
Submitted by:
Zhong Meng
Last updated:
12 May 2019 - 9:24pm
Document Type:
Document Year:
Yong Zhao
Paper Code:


The use of deep networks to extract embeddings for speaker recognition has proven successfully. However, such embeddings are susceptible to performance degradation due to the mismatches among the training, enrollment, and test conditions. In this work, we propose an adversarial speaker verification (ASV) scheme to learn the condition-invariant deep embedding via adversarial multi-task training. In ASV, a speaker classification network and a condition identification network are jointly optimized to minimize the speaker classification loss and simultaneously mini-maximize the condition loss. The target labels of the condition network can be categorical (environment types) and continuous (SNR values). We further propose multi-factorial ASV to simultaneously suppress multiple factors that constitute the condition variability. Evaluated on a Microsoft Cortana text-dependent speaker verification task, the ASV achieves 8.8% and 14.5% relative improvements in equal error rates (EER) for known and unknown conditions, respectively.

Document File(s): 
0 users have voted: