DEREVERBERATION AND BEAMFORMING IN FAR-FIELD SPEAKER RECOGNITION

This paper deals with far-field speaker recognition. On a corpus of NIST SRE 2010 data retransmitted in a real room with multiple microphones, we first demonstrate how room acoustics cause significant degradation of state-of-the-art i-vector based speaker recognition system. We then investigate several techniques to improve the performances ranging from probabilistic linear discriminant analysis (PLDA) re-training, through dereverberation, to beamforming. We found that weighted prediction error (WPE) based dereverberation combined with generalized eigenvalue beamformer with power-spectral density (PSD) weighting masks generated by neural networks (NN) provides results approaching the clean close-microphone setup. Further improvement was obtained by re-training PLDA or the mask-generating NNs on simulated target data. The work shows that a speaker recognition system working robustly in the far-field scenario can be developed.

icassp_poster_mosner.pdf

icassp_poster_mosner.pdf (563)

Thumbs Up

CITE

Documents

Poster

DEREVERBERATION AND BEAMFORMING IN FAR-FIELD SPEAKER RECOGNITION

icassp_poster_mosner.pdf

QUESTIONS?