Sorry, you need to enable JavaScript to visit this website.

DISENTANGLED SPEAKER EMBEDDING FOR ROBUST SPEAKER VERIFICATION

Citation Author(s):
Lu YI, Man Wai MAK
Submitted by:
Lu YI
Last updated:
5 May 2022 - 3:38am
Document Type:
Poster
Document Year:
2022
Event:
Paper Code:
SPE-57.6
 

Entanglement of speaker features and redundant features may lead to poor performance when evaluating speaker verification systems on an unseen domain. To address this issue, we propose an InfoMax domain separation and adaptation network (InfoMax–DSAN) to disentangle the domain-specific features and domain-invariant speaker features based on domain adaptation techniques. A frame-based mutual information neural estimator is proposed to maximize the mutual information between frame-level features and input acoustic features, which can help retain more useful information. Furthermore, we propose adopting triplet loss based on the idea of self-supervised learning to overcome the label mismatch problem. Experimental results on VOiCES Challenge 2019 demonstrate that our proposed method can help learn more discriminative and robust speaker embeddings.

up
0 users have voted: