Sorry, you need to enable JavaScript to visit this website.

DEEPFAKE SPEECH DETECTION THROUGH EMOTION RECOGNITION: A SEMANTIC APPROACH

Citation Author(s):
Emanuele Conti, Davide Salvi, Clara Borrelli, Brian Hosler, Paolo Bestagini, Fabio Antonacci, Augusto Sarti, Matthew Stamm, Stefano Tubaro
Submitted by:
Davide Salvi
Last updated:
15 May 2022 - 6:13am
Document Type:
Poster
Document Year:
2022
Event:
Presenters:
Davide Salvi
Paper Code:
3940
Categories:
 

In recent years, audio and video deepfake technology has advanced relentlessly, severely impacting people's reputation and reliability.
Several factors have facilitated the growing deepfake threat.
On the one hand, the hyper-connected society of social and mass media enables the spread of multimedia content worldwide in real-time, facilitating the dissemination of counterfeit material.
On the other hand, neural network-based techniques have made deepfakes easier to produce and difficult to detect, showing that the analysis of low-level features is no longer sufficient for the task.
This situation makes it crucial to design systems that allow detecting deepfakes at both video and audio levels.
In this paper, we propose a new audio spoofing detection system leveraging emotional features.
The rationale behind the proposed method is that audio deepfake techniques cannot correctly synthesize natural emotional behavior.
Therefore, we feed our deepfake detector with high-level features obtained from a state-of-the-art Speech Emotion Recognition (SER) system.
As the used descriptors capture semantic audio information, the proposed system proves robust in cross-dataset scenarios outperforming the considered baseline on multiple datasets.

up
0 users have voted: