General Speech Restoration Using Two-stage Generative Adversarial Networks (slides)

General speech restoration is a challenging task, which requires removing multiple types of distortions within a single system. The prevailing methods for general speech restoration largely rely on generative models, leveraging their ability to generate speech components based on prior knowledge of clean speech characteristics. Our approach adopts a two-stage processing scheme, comprising a speech restoration module and a speech enhancement module. The restoration module utilizes dilated convolutional networks and is trained using LSGAN losses. In contrast, the speech enhancement module employs a convolutional-recurrent network and is trained using metric-GAN losses. The proposed system achieves an overall opinion score (MOS) of 2.944 and a final score of 0.6805, ranking 3rd in Track 1 of the ICASSP 2024 Speech Signal Improvement Challenge (SIG-2)

General Speech Restoration Using Two-stage Generative Adversarial Networks.pptx

General Speech Restoration Using Two-stage Generative Adversarial Networks.pptx (200)

Thumbs Up

CITE

Documents

Presentation Slides

General Speech Restoration Using Two-stage Generative Adversarial Networks (slides)

General Speech Restoration Using Two-stage Generative Adversarial Networks.pptx

QUESTIONS?