Boosting Speech Enhancement with Clean Self-Supervised Features Via Conditional Variational Autoencoders

Recently, Self-Supervised Features (SSF) trained on extensive speech datasets have shown significant performance gains across various speech processing tasks. Nevertheless, their effectiveness in Speech Enhancement (SE) systems is often suboptimal due to insufficient optimization for noisy environments. To address this issue, we present a novel methodology that directly utilizes SSFs extracted from clean speech for enhancing SE models. Specifically, we leverage the clean SSFs for latent space modeling within the Conditional Variational Autoencoder (CVAE) framework. Consequently, we enable our model to fully leverage the knowledge existing in the clean SSFs without the interference of noise. In experiments, our approach yields clear improvements over existing methods that use SSFs across six evaluation metrics. Furthermore, we provide comprehensive analyses to validate the effectiveness of 1) incorporating clean SSFs within the CVAE framework and 2) the training techniques used to achieve optimal performance from our approach in SE systems. Code and audio samples are available at https://github.com/YoonhyungLee94/SSFCVAE

240416_ICASSP_Boosting_Speech_Enhancement.pptx

240416_ICASSP_Boosting_Speech_Enhancement.pptx (150)

Thumbs Up

CITE

Documents

Presentation Slides

Boosting Speech Enhancement with Clean Self-Supervised Features Via Conditional Variational Autoencoders

240416_ICASSP_Boosting_Speech_Enhancement.pptx

QUESTIONS?