Sorry, you need to enable JavaScript to visit this website.

MaskMark: Robust Neural Watermarking for Real and Synthetic Speech (Slides)

Citation Author(s):
Patrick O'Reilly, Zeyu Jin, Jiaqi Su, Bryan Pardo
Submitted by:
Patrick O'Reilly
Last updated:
15 April 2024 - 9:00pm
Document Type:
Presentation Slides
Document Year:
2024
Event:
Presenters:
Patrick O'Reilly
 

High-quality speech synthesis models may be used to spread misinformation or impersonate voices. Audio watermarking can help combat such misuses by embedding a traceable signature in generated audio. However, existing audio watermarks are not designed for synthetic speech and typically demonstrate robustness to only a small set of transformations of the watermarked audio. To address this, we propose MaskMark, a neural network-based digital audio watermarking technique optimized for speech. MaskMark embeds a secret key vector in audio via a multiplicative spectrogram mask, allowing the detection of watermarked real and synthetic speech segments even under substantial signal-processing or neural network-based transformations. Comparisons to a state-of-the-art baseline on natural and synthetic speech corpora and a human subjects evaluation demonstrate MaskMark's superior robustness in detecting watermarked speech while maintaining high perceptual transparency.

up
0 users have voted:

Comments

Supplemental audio files are available at: https://drive.google.com/drive/folders/1F2JlNb7Zc32-eLbOp88Sl8soQ8tMPADU...

The project webpage, which includes additional audio examples, is available at: https://interactiveaudiolab.github.io/project/maskmark.html

A version of this slideshow is available via Google Slides here: https://docs.google.com/presentation/d/1HNl9w7SF6ZvuOg8Hn6xyYlwaQ8kqr7vG...