Sorry, you need to enable JavaScript to visit this website.

Multi-level deep neural network adaptation for speaker verification using MMD and consistency regularization

Citation Author(s):
Weiwei Lin, Man-Mai Mak, Na Li, Dan Su, Dong Yu
Submitted by:
Man-Wai Mak
Last updated:
13 May 2020 - 10:05pm
Document Type:
Presentation Slides
Document Year:
2020
Event:
Presenters:
Man-Wai MAK
Paper Code:
3043
 

Adapting speaker verification (SV) systems to a new environ- ment is a very challenging task. Current adaptation methods in SV mainly focus on the backend, i.e, adaptation is carried out after the speaker embeddings have been created. In this paper, we present a DNN-based adaptation method using maximum mean discrepancy (MMD). Our method exploits two important aspects neglected by previous research. First, instead of minimizing domain discrepancy at utterance-level alone, our method minimizes domain discrepancy at both frame-level and utterance-level, which we believe will make the adaptation more robust to the duration discrepancy be- tween training data and test data. Second, we introduce a consistency regularization for unlabelled target-domain data. The consistency regularization encourages the target speaker embeddings robust to adverse perturbations. Experiments on NIST SRE 2016 and 2018 show that our DNN adaptation works significantly better than the previously proposed DNN adaptation methods. What’s more, our method works well with backend adaptation. By combining the proposed method with backend adaptation, we achieve a 9% improvement over backend adaptation in SRE18.

up
0 users have voted: