Sorry, you need to enable JavaScript to visit this website.

CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction

Citation Author(s):
Bo-Jun Li, Tai-Shih Chi
Submitted by:
Bo-Jun Li
Last updated:
9 May 2019 - 1:00pm
Document Type:
Poster
Document Year:
2019
Event:
Presenters:
Ming-Tso Chen
Paper Code:
AASP-P16.9
 

Inspired by human hearing perception, we propose a twostage multi-resolution end-to-end model for singing melody extraction in this paper. The convolutional neural network (CNN) is the core of the proposed model to generate multiresolution representations. The 1-D and 2-D multi-resolution analysis on waveform and spectrogram-like graph are successively carried out by using 1-D and 2-D CNN kernels of different lengths and sizes. The 1-D CNNs with kernels of different lengths produce multi-resolution spectrogram-like graphs without suffering from the trade-off between spectral and temporal resolutions. The 2-D CNNs with kernels of different sizes extract features from spectro-temporal envelopes of different scales. Experiment results show the proposed model outperforms three compared systems in three out of five public databases.

up
0 users have voted: