Neural Sound Synthesis and Representation

Poster

Read more about Poster
Log in to post comments

Listening to spoken content often requires modifying the speech rate while preserving the timbre and pitch of the speaker. To date, advanced signal processing techniques are used to address this task, but it still remains a challenge to maintain a high speech quality at all time-scales. Inspired by the success of speech generation using Generative Adversarial Networks (GANs), we propose a novel unsupervised learning algorithm for time-scale modification (TSM) of speech, called ScalerGAN. The model is trained using a set of speech utterances, where no time-scales are provided.

scalerGAN ICASSP 2023 poster.pdf

scalerGAN ICASSP 2023 poster.pdf (292)

Categories:: Speech Synthesis and Generation, including TTS (SPE-SYNT)

35 Views