Sorry, you need to enable JavaScript to visit this website.

MDRT: MULTI-DOMAIN SYNTHETIC SPEECH LOCALIZATION

Citation Author(s):
Kratika Bhagtani, Sriram Baireddy, Paolo Bestagini, Stefano Tubaro, Edward J. Delp
Submitted by:
Amit Kumar Sing...
Last updated:
15 April 2024 - 4:34pm
Document Type:
Poster
Document Year:
2024
Event:
Presenters:
Amit Kumar Singh Yadav
Paper Code:
SLP-P38.3
 

With recent advancements in generating synthetic speech, tools to generate high-quality synthetic speech impersonating any human speaker are easily available. Several incidents report misuse of high-quality synthetic speech for spreading misinformation and for large-scale financial frauds. Many methods have been proposed for detecting synthetic speech; however, there is limited work on localizing the synthetic segments within the speech signal. In this work, our goal is to localize the synthetic speech segments in a partially synthetic speech signal. Most existing methods for synthetic speech localization obtain features from either the time domain waveform or the spectrogram representation of the speech signal. In this work, we propose Multi-Domain ResNet Transformer (MDRT) that obtains multi-domain features from both the time domain and the spectrogram representation of a speech signal to localize synthetic speech segments. MDRT uses transformer neural networks to obtain multi-domain features and processes them using a ResNet-style neural network. We use the PartialSpoof dataset to examine the performance of MDRT on localizing synthetic speech segments of varying duration. Our results show that MDRT performs better than several existing synthetic speech localization methods.

up
0 users have voted: