Sorry, you need to enable JavaScript to visit this website.

Boosting Speech Enhancement with Clean Self-Supervised Features Via Conditional Variational Autoencoders

Citation Author(s):
Submitted by:
YOONHYUNG LEE
Last updated:
15 April 2024 - 12:12pm
Document Type:
Presentation Slides
Document Year:
2024
Event:
Presenters:
Yoonhyung Lee
Paper Code:
SLP-L1.3
 

Recently, Self-Supervised Features (SSF) trained on extensive speech datasets have shown significant performance gains across various speech processing tasks. Nevertheless, their effectiveness in Speech Enhancement (SE) systems is often suboptimal due to insufficient optimization for noisy environments. To address this issue, we present a novel methodology that directly utilizes SSFs extracted from clean speech for enhancing SE models. Specifically, we leverage the clean SSFs for latent space modeling within the Conditional Variational Autoencoder (CVAE) framework. Consequently, we enable our model to fully leverage the knowledge existing in the clean SSFs without the interference of noise. In experiments, our approach yields clear improvements over existing methods that use SSFs across six evaluation metrics. Furthermore, we provide comprehensive analyses to validate the effectiveness of 1) incorporating clean SSFs within the CVAE framework and 2) the training techniques used to achieve optimal performance from our approach in SE systems. Code and audio samples are available at https://github.com/YoonhyungLee94/SSFCVAE

up
0 users have voted: