Documents
Presentation Slides
Boosting Speech Enhancement with Clean Self-Supervised Features Via Conditional Variational Autoencoders
- Citation Author(s):
- Submitted by:
- YOONHYUNG LEE
- Last updated:
- 15 April 2024 - 12:12pm
- Document Type:
- Presentation Slides
- Document Year:
- 2024
- Event:
- Presenters:
- Yoonhyung Lee
- Paper Code:
- SLP-L1.3
- Categories:
- Log in to post comments
Recently, Self-Supervised Features (SSF) trained on extensive speech datasets have shown significant performance gains across various speech processing tasks. Nevertheless, their effectiveness in Speech Enhancement (SE) systems is often suboptimal due to insufficient optimization for noisy environments. To address this issue, we present a novel methodology that directly utilizes SSFs extracted from clean speech for enhancing SE models. Specifically, we leverage the clean SSFs for latent space modeling within the Conditional Variational Autoencoder (CVAE) framework. Consequently, we enable our model to fully leverage the knowledge existing in the clean SSFs without the interference of noise. In experiments, our approach yields clear improvements over existing methods that use SSFs across six evaluation metrics. Furthermore, we provide comprehensive analyses to validate the effectiveness of 1) incorporating clean SSFs within the CVAE framework and 2) the training techniques used to achieve optimal performance from our approach in SE systems. Code and audio samples are available at https://github.com/YoonhyungLee94/SSFCVAE