Artificial Bandwidth Extension of Narrowband Speech Using Generative Adversarial Networks

The aim of artificial bandwidth extension is to recreate wideband speech (0 - 8 kHz) from a narrowband speech signal (0 - 4 kHz). State-of-the-art approaches use neural networks for this task. As a loss function during training, they employ the mean squared error between true and estimated wideband spectra. This, however, comes with the drawback of over-smoothing, which expresses itself in strongly underestimated dynamics of the upper frequency band. We previously proposed to tackle this problem by discriminative training, i.e., a modification of the loss function that is designed to improve the separation between fricatives and vowels. Other authors instead took a generative adversarial network (GAN) approach. This was motivated by the fact that GANs demonstrated big reductions of over-smoothing in speech synthesis. In this work, we combine these two approaches. In particular, we show that conditional GANs improve the speech quality by a CMOS score of 0.28 compared to GANs while the combined approach yields an improvement of 0.84.

https://ieeexplore.ieee.org/document/8682649

ICASSP2019_Sautter_Poster.pdf

ICASSP2019_Sautter_Poster.pdf (518)

Thumbs Up

CITE

Documents

Poster

Artificial Bandwidth Extension of Narrowband Speech Using Generative Adversarial Networks

ICASSP2019_Sautter_Poster.pdf

QUESTIONS?