Sorry, you need to enable JavaScript to visit this website.

COMPLEX RATIO MASKING FOR SINGING VOICE SEPARATION

Citation Author(s):
Yixuan Zhang, Yuzhou Liu, DeLiang Wang
Submitted by:
Yixuan Zhang
Last updated:
22 June 2021 - 2:25pm
Document Type:
Poster
Document Year:
2021
Event:
Presenters:
Yixuan Zhang
Paper Code:
COMPLEX RATIO MASKING FOR SINGING VOICE SEPARATION
 

Music source separation is important for applications such as karaoke and remixing. Much of previous research
focuses on estimating magnitude short-time Fourier transform (STFT) and discarding phase information. We observe that,
for singing voice separation, phase has the potential to make considerable improvement in separation quality. This paper
proposes a complex-domain deep learning method for voice and accompaniment separation. The proposed method employs
DenseUNet with self attention to estimate the real and imaginary components of STFT for each sound source. A simple ensemble
technique is introduced to further improve separation performance. Evaluation results demonstrate that the proposed method
outperforms recent state-of-the-art models for both separated voice and accompaniment.

up
0 users have voted: