Sorry, you need to enable JavaScript to visit this website.

A patch-based convolutional neural network (CNN) model presented in this paper for vocal melody extraction in polyphonic music is inspired from object detection in image processing. The input of the model is a novel time-frequency representation which enhances the pitch contours and suppresses the harmonic components of a signal. This succinct data representation and the patch-based CNN model enable an efficient training process with limited labeled data. Experiments on various datasets show excellent speed and competitive accuracy comparing to other deep learning approaches.

Categories:
5 Views

Automatic transcription of polyphonic music remains a challenging task in the field of Music Information Retrieval. In this paper, we propose a new method to post-process the output of a multi-pitch detection model using recurrent neural networks. In particular, we compare the use of a fixed sample rate against a meter-constrained time step on a piano performance audio dataset. The metric ground truth is estimated using automatic symbolic alignment, which we make available for further study.

Categories:
18 Views

Spoken content processing (such as retrieval and browsing) is maturing, but the singing content is still almost completely left out. Songs are human voice carrying plenty of semantic information just as speech, and may be considered as a special type of speech with highly flexible prosody. The various problems in song audio, for example the significantly changing phone duration over highly flexible pitch contours, make the recognition of lyrics from song audio much more difficult. This paper reports an initial attempt towards this goal.

Categories:
7 Views

So far, few cover song identification systems that utilize index techniques achieve great success. In this paper, we propose a novel approach based on skipping bigrams that could be used for effective index. By applying Vector Quantization, our algorithm encodes signals into code sequences. Then, the bigram histograms of code sequences are used to represent the original recordings and measure their similarities. Through Vector Quantization and skipping bigrams, our model shows great robustness against speed and structure variations in cover songs.

Categories:
24 Views

In audio source separation applications, it is common to model the sources as circular-symmetric Gaussian random variables, which is equivalent to assuming that the phase of each source is uniformly distributed. In this paper, we introduce an anisotropic Gaussian source model in which both the magnitude and phase parameters are modeled as random variables. In such a model, it becomes possible to promote a phase value that originates from a signal model and to adjust the relative importance of this underlying model-based phase constraint.

Categories:
6 Views

For a musical instrument sound containing partials, or modes, the behavior of modes around the attack time is particularly important. However, accurately decomposing it around the attack time is not an easy task, especially when the onset is sharp. This is because spectra of the modes are peaky while the sharp onsets need a broad one. In this paper, an optimization-based method of modal decomposition is proposed to achieve accurate decomposition around the attack time.

Categories:
28 Views

In this paper, we propose a cover song identification algorithm using a convolutional neural network (CNN). We first train the CNN model to classify any non-/cover relationship, by feeding a cross-similarity matrix that is generated from a pair of songs as an input. Our main idea is to use the CNN output–the cover-probabilities of one song to all other candidate songs–as a new representation vector for measuring the distance between songs. Based on this, the present algorithm searches cover songs by applying several ranking methods: 1. sorting without using the representation vectors; 2.

Categories:
161 Views

In this paper, we propose a cover song identification algorithm using a convolutional neural network (CNN). We first train the CNN model to classify any non-/cover relationship, by feeding a cross-similarity matrix that is generated from a pair of songs as an input. Our main idea is to use the CNN output–the cover-probabilities of one song to all other candidate songs–as a new representation vector for measuring the distance between songs. Based on this, the present algorithm searches cover songs by applying several ranking methods: 1. sorting without using the representation vectors; 2.

Categories:
24 Views

Pages