Documents
Presentation Slides
MaskMark: Robust Neural Watermarking for Real and Synthetic Speech (Slides)
- Citation Author(s):
- Submitted by:
- Patrick O'Reilly
- Last updated:
- 15 April 2024 - 9:00pm
- Document Type:
- Presentation Slides
- Document Year:
- 2024
- Event:
- Presenters:
- Patrick O'Reilly
- Categories:
- Log in to post comments
High-quality speech synthesis models may be used to spread misinformation or impersonate voices. Audio watermarking can help combat such misuses by embedding a traceable signature in generated audio. However, existing audio watermarks are not designed for synthetic speech and typically demonstrate robustness to only a small set of transformations of the watermarked audio. To address this, we propose MaskMark, a neural network-based digital audio watermarking technique optimized for speech. MaskMark embeds a secret key vector in audio via a multiplicative spectrogram mask, allowing the detection of watermarked real and synthetic speech segments even under substantial signal-processing or neural network-based transformations. Comparisons to a state-of-the-art baseline on natural and synthetic speech corpora and a human subjects evaluation demonstrate MaskMark's superior robustness in detecting watermarked speech while maintaining high perceptual transparency.
Comments
Supplemental audio
Supplemental audio files are available at: https://drive.google.com/drive/folders/1F2JlNb7Zc32-eLbOp88Sl8soQ8tMPADU...
The project webpage, which includes additional audio examples, is available at: https://interactiveaudiolab.github.io/project/maskmark.html
A version of this slideshow is available via Google Slides here: https://docs.google.com/presentation/d/1HNl9w7SF6ZvuOg8Hn6xyYlwaQ8kqr7vG...