Sorry, you need to enable JavaScript to visit this website.

End-to-end Detection of Attacks to Automatic Speaker Recognizers with Time-attentive Light Convolutional Neural Networks

Primary tabs

Citation Author(s):
Joao Monteiro,Jahangir Alam,Tiago H. Falk
Submitted by:
Joao Monteiro Filho
Last updated:
6 November 2019 - 2:12pm
Document Type:
Document Year:
Presenters Name:
Joao Monteiro
Paper Code:



In this contribution, we introduce convolutional neural network architectures aiming at performing end-to-end detection of attacks to voice biometrics systems, i.e. the model provides scores corresponding to the likelihood of attack given general purpose time-frequency features obtained from speech. Microphone level attackers based on speech synthesis and voice conversion techniques are considered, along with presentation replay attacks. While the convolutional models yield a sequence of representations corresponding to different parts of the input at varying time steps, concatenated first- and second-order statistics pooled from the outputs of a self-attention layer are used as a fixed-dimension representations of utterances of varying length, which are then input into a set of fully connected layers to finally yield scores. Evaluation of the proposed framework is performed with data from ASVspoof 2019 challenge yielding relative improvements higher than one order of magnitude in terms of equal error rate over two baseline systems provided by ASVspoof 2019's organizers, and significant improvements over the benchmark systems we evaluated.

0 users have voted:

Dataset Files