Sorry, you need to enable JavaScript to visit this website.

MDX-GAN: ENHANCING PERCEPTUAL QUALITY IN MULTI-CLASS SOURCE SEPARATION VIA ADVERSARIAL TRAINING

Citation Author(s):
Ke Chen, Jiaqi Su, Zeyu Jin
Submitted by:
Ke Chen
Last updated:
15 April 2024 - 8:27am
Document Type:
Poster
Document Year:
2024
Event:
Presenters:
Ke Chen
Paper Code:
AASP-P16.6
 

Audio source separation aims to extract individual sound sources from an audio mixture. Recent studies on source separation focus primarily on minimizing signal-level distance, typically measured by source-to-distortion ratio (SDR). However, scant attention has been given to the perceptual quality of the separated tracks. In this paper, we propose MDX-GAN, an efficient and high-fidelity audio source separator based on MDX-Net for multiple sound classes. We leverage different training objectives to enhance the perceptual quality of audio source separation. Specifically, we adopt perceptually-motivated loss functions on top of the waveform loss, including multi-resolution STFT and Mel-spectrogram losses, and employ the adversarial training paradigm with multi-domain and multi-scale discriminators to refine the perceptual quality of separation. Additionally, we extend the model to support multiple sound classes within a single network via feature-wise linear modulation (FiLM). We conduct both objective and subjective experiments to evaluate MDX-GAN on real-world settings, and assess the impacts of design components on the perceptual quality and SDR scores. Results demonstrate that MDX-GAN accurately separates the sound source and achieves superior perceptual quality.

up
0 users have voted: