Sorry, you need to enable JavaScript to visit this website.

Poster

DOI:
10.60864/a7nw-vr77
Citation Author(s):
Eyal Cohen, Felix Kreuk, Joseph Keshet
Submitted by:
Eyal Cohen
Last updated:
17 November 2023 - 12:08pm
Document Type:
Poster
Document Year:
2023
Event:
Presenters:
Eyal Cohen
Paper Code:
6710
 

Listening to spoken content often requires modifying the speech rate while preserving the timbre and pitch of the speaker. To date, advanced signal processing techniques are used to address this task, but it still remains a challenge to maintain a high speech quality at all time-scales. Inspired by the success of speech generation using Generative Adversarial Networks (GANs), we propose a novel unsupervised learning algorithm for time-scale modification (TSM) of speech, called ScalerGAN. The model is trained using a set of speech utterances, where no time-scales are provided. The ScalerGAN algorithm is composed of a generator that gets as input speech with the desired rate and outputs a time-adjusted speech; a discriminator that works on various spectrum scales; and a decoder that converts the time-adjusted signal back to the original rate to maintain consistency. Using an A/B test and conditional A/B test, human listeners were asked to compare ScalerGAN with other state-of-the-art TSM methods. The results showed that the speech quality of ScalerGAN outperforms all other methods.

up
0 users have voted: