Sorry, you need to enable JavaScript to visit this website.

Artificial Bandwidth Extension of Narrowband Speech Using Generative Adversarial Networks

Citation Author(s):
Jonas Sautter, Friedrich Faubel, Markus Buck, Gerhard Schmidt
Submitted by:
Jonas Sautter
Last updated:
8 May 2019 - 2:16am
Document Type:
Poster
Document Year:
2019
Event:
Presenters:
Jonas Sautter
 

The aim of artificial bandwidth extension is to recreate wideband speech (0 - 8 kHz) from a narrowband speech signal (0 - 4 kHz). State-of-the-art approaches use neural networks for this task. As a loss function during training, they employ the mean squared error between true and estimated wideband spectra. This, however, comes with the drawback of over-smoothing, which expresses itself in strongly underestimated dynamics of the upper frequency band. We previously proposed to tackle this problem by discriminative training, i.e., a modification of the loss function that is designed to improve the separation between fricatives and vowels. Other authors instead took a generative adversarial network (GAN) approach. This was motivated by the fact that GANs demonstrated big reductions of over-smoothing in speech synthesis. In this work, we combine these two approaches. In particular, we show that conditional GANs improve the speech quality by a CMOS score of 0.28 compared to GANs while the combined approach yields an improvement of 0.84.

https://ieeexplore.ieee.org/document/8682649

up
0 users have voted: