META REPRESENTATION LEARNING METHOD FOR ROBUST SPEAKER VERIFICATION IN UNSEEN DOMAINS

This paper presents a meta representation learning method for robust speaker verification (SV) in unseen domains. It is known that the existing embedding learning based SV systems may suffer from domain mismatch issues. To address this, we propose an episodic training procedure to compensate domain mismatch conditions at runtime. Specifically, episodes are constructed with domain balanced episodic sampling from two different domains, and a new domain alignment (DA) module is added besides the feature extractor (FE) and classifier to existing network structures. In each episodic training iteration, FE and DA modules are optimized separately with different objectives to improve the robustness of learning. Besides, a cross-domain inter-class alignment (CDICA) loss is proposed for improving the domain generalization ability. Experimental results on CNCeleb and VoxCeleb benchmarks demonstrate significant performance gains for unseen domains in SV.

MRL_ICASSP_origin.pptx

MRL_ICASSP_origin.pptx (147)

Thumbs Up

CITE

Documents

Presentation Slides

META REPRESENTATION LEARNING METHOD FOR ROBUST SPEAKER VERIFICATION IN UNSEEN DOMAINS

MRL_ICASSP_origin.pptx

QUESTIONS?