Documents
Poster
CNN Based Two-Stage Multi-Resolution End-to-End Model for Singing Melody Extraction
- Citation Author(s):
- Submitted by:
- Bo-Jun Li
- Last updated:
- 9 May 2019 - 1:00pm
- Document Type:
- Poster
- Document Year:
- 2019
- Event:
- Presenters:
- Ming-Tso Chen
- Paper Code:
- AASP-P16.9
- Categories:
- Log in to post comments
Inspired by human hearing perception, we propose a twostage multi-resolution end-to-end model for singing melody extraction in this paper. The convolutional neural network (CNN) is the core of the proposed model to generate multiresolution representations. The 1-D and 2-D multi-resolution analysis on waveform and spectrogram-like graph are successively carried out by using 1-D and 2-D CNN kernels of different lengths and sizes. The 1-D CNNs with kernels of different lengths produce multi-resolution spectrogram-like graphs without suffering from the trade-off between spectral and temporal resolutions. The 2-D CNNs with kernels of different sizes extract features from spectro-temporal envelopes of different scales. Experiment results show the proposed model outperforms three compared systems in three out of five public databases.