Sorry, you need to enable JavaScript to visit this website.

The recent deep learning methods can offer state-of-the-art performance for Monaural Singing Voice Separation (MSVS). In these deep methods, the recurrent neural network (RNN) is widely employed. This work proposes a novel type of Deep RNN (DRNN), namely Proximal DRNN (P-DRNN) for MSVS, which improves the conventional Stacked RNN (S-RNN) by introducing a novel interlayer structure. The interlayer structure is derived from an optimization problem for Monaural Source Separation (MSS).

Categories:
4 Views

For many music analysis problems, we need to know the presence
of instruments for each time frame in a multi-instrument
musical piece. However, such a frame-level instrument recognition
task remains difficult, mainly due to the lack of labeled
datasets. To address this issue, we present in this paper a
large-scale dataset that contains synthetic polyphonic music
with frame-level pitch and instrument labels. Moreover, we
propose a simple yet novel network architecture to jointly predict

Categories:
18 Views

In this paper, a phase-aware HPSS method through convex optimization was proposed. Based on two HPSS approaches (anisotropic smoothness and sinusoidal model), the proposed method assumes the smoothness of the complex-valued spectrogram of harmonic components calculated by iPC-STFT in the time direction. On the other hand, the time-frame-wise sparsity of percussive spectrograms was considered as a phase insensitive prior. Furthermore, the proposed method considers the perfect reconstruction constraint in the time domain instead of power spectrograms.

Categories:
83 Views

Low-rankness of amplitude spectrograms has been effectively utilized in audio signal processing methods including non-negative matrix factorization. However, such methods have a fundamental limitation owing to their amplitude-only treatment where the phase of the observed signal is utilized for resynthesizing the estimated signal. In order to address this limitation, we directly treat a complex-valued spectrogram and show a complex-valued spectrogram of a sum of sinusoids can be approximately low-rank by modifying its phase.

Categories:
91 Views

In this paper, we build up a hybrid neural network (NN) for singing melody extraction from polyphonic music by imitating human pitch perception. For human hearing, there are two pitch perception models, the spectral model and the temporal model, in accordance with whether harmonics are resolved or not. Here, we first use NNs to implement individual models and evaluate their performance in the task of singing melody extraction.

Categories:
28 Views

In this paper, we design a novel deep learning based hybrid system for automatic chord recognition. Currently, there is a bottleneck in the amount of enough annotated data for training robust acoustic models, as hand annotating time-synchronized chord labels requires professional musical skills and considerable labor. As a solution to this problem, we construct a large set of time synchronized MIDI-audio pairs, and use these data to train a Deep Residual Network (DRN) feature extractor, which can then estimate pitch class activations of real-world music audio recordings.

Categories:
90 Views

The task of estimating the fundamental frequency of a monophonic sound recording, also known as pitch tracking, is fundamental to audio processing with multiple applications in speech processing and music information retrieval. To date, the best performing techniques, such as the pYIN algorithm, are based on a combination of DSP pipelines and heuristics. While such techniques perform very well on average, there remain many cases in which they fail to correctly estimate the pitch.

Categories:
29 Views

A patch-based convolutional neural network (CNN) model presented in this paper for vocal melody extraction in polyphonic music is inspired from object detection in image processing. The input of the model is a novel time-frequency representation which enhances the pitch contours and suppresses the harmonic components of a signal. This succinct data representation and the patch-based CNN model enable an efficient training process with limited labeled data. Experiments on various datasets show excellent speed and competitive accuracy comparing to other deep learning approaches.

Categories:
6 Views

Pages