Documents
Presentation Slides
RENet: A Time-Frequency Domain General Speech Restoration Network for ICASSP 2024 Speech Signal Improvement Challenge
- DOI:
- 10.60864/nfye-6670
- Citation Author(s):
- Submitted by:
- Fengyuan Hao
- Last updated:
- 6 June 2024 - 10:28am
- Document Type:
- Presentation Slides
- Event:
- Presenters:
- Fengyuan Hao
- Paper Code:
- GC-L11.4
- Categories:
- Log in to post comments
The ICASSP 2024 Speech Signal Improvement (SSI) Challenge seeks to address speech quality degradation problems in telecommunication systems. In this context, this paper proposes RENet, a time-frequency (T-F) domain method leveraging complex spectrum mapping to mitigate speech distortions. Specifically, the proposed RENet is a multi-stage network. First, TF-GridGAN was designed to recover the degraded speech with a generative adversarial network (GAN). Second, a full-band enhancement module was introduced to eliminate residual noises and artifacts existed in the output of TF-GridGAN. Finally, a lightweight bandwidth extension (BWE) network was implemented to further improve the speech quality by generating high-resolution speeches. Subjective results confirmed the competitive performance of the proposed method under various distortions, and the proposed method ranked the 2nd place in the non-real-time track of the ICASSP 2024 SSI Challenge.