Sorry, you need to enable JavaScript to visit this website.

Phase recovery with Bregman divergences for audio source separation

Citation Author(s):
Paul Magron, Pierre-Hugo Vial, Thomas Oberlin, Cédric Févotte
Submitted by:
Paul Magron
Last updated:
16 June 2021 - 6:22am
Document Type:
Presentation Slides
Document Year:
2021
Event:
Presenters:
Paul Magron
 

Time-frequency audio source separation is usually achieved by estimating the short-time Fourier transform (STFT) magnitude of each source, and then applying a phase recovery algorithm to retrieve time-domain signals. In particular, the multiple input spectrogram inversion (MISI) algorithm has shown good performance in several recent works. This algorithm minimizes a quadratic reconstruction error between magnitude spectrograms. However, this loss does not properly account for some perceptual properties of audio, and alternative discrepancy measures such as beta-divergences have been preferred in many settings. In this paper, we propose to reformulate phase recovery in audio source separation as a minimization problem involving Bregman divergences. To optimize the resulting objective, we derive a projected gradient descent algorithm. Experiments conducted on a speech enhancement task show that this approach outperforms MISI for several alternative losses, which highlights their relevance for audio source separation applications.

up
0 users have voted: