Sorry, you need to enable JavaScript to visit this website.

Speaker anonymization using neural audio codec language models

Citation Author(s):
Michele Panariello, Franscesco Nespoli, Massimiliano Todisco, Nicholas Evans
Submitted by:
Michele Panariello
Last updated:
5 April 2024 - 4:12am
Document Type:
Poster
Document Year:
2024
Event:
Presenters:
Michele Panariello
Paper Code:
https://github.com/eurecom-asp/spk_anon_nac_lm
Categories:
 

The vast majority of approaches to speaker anonymization involve the extraction of fundamental frequency estimates, linguistic features and a speaker embedding which is perturbed to obfuscate the speaker identity before an anonymized speech waveform is resynthesized using a vocoder.
Recent work has shown that x-vector transformations are difficult to control consistently: other sources of speaker information contained within fundamental frequency and linguistic features are re-entangled upon vocoding, meaning that anonymized speech signals still contain speaker information.
We propose an approach based upon neural audio codecs (NACs), which are known to generate high-quality synthetic speech when combined with language models. NACs use quantized codes, which are known to effectively bottleneck speaker-related information: we demonstrate the potential of speaker anonymization systems based on NAC language modeling by applying the evaluation framework of the Voice Privacy Challenge 2022.

up
0 users have voted: