Documents
Poster
End-to-end Detection of Attacks to Automatic Speaker Recognizers with Time-attentive Light Convolutional Neural Networks
- Citation Author(s):
- Submitted by:
- Joao Monteiro Filho
- Last updated:
- 6 November 2019 - 2:12pm
- Document Type:
- Poster
- Document Year:
- 2019
- Event:
- Presenters:
- Joao Monteiro
- Paper Code:
- 204
- Categories:
- Log in to post comments
In this contribution, we introduce convolutional neural network architectures aiming at performing end-to-end detection of attacks to voice biometrics systems, i.e. the model provides scores corresponding to the likelihood of attack given general purpose time-frequency features obtained from speech. Microphone level attackers based on speech synthesis and voice conversion techniques are considered, along with presentation replay attacks. While the convolutional models yield a sequence of representations corresponding to different parts of the input at varying time steps, concatenated first- and second-order statistics pooled from the outputs of a self-attention layer are used as a fixed-dimension representations of utterances of varying length, which are then input into a set of fully connected layers to finally yield scores. Evaluation of the proposed framework is performed with data from ASVspoof 2019 challenge yielding relative improvements higher than one order of magnitude in terms of equal error rate over two baseline systems provided by ASVspoof 2019's organizers, and significant improvements over the benchmark systems we evaluated.