Music Signal Processing

Proximal Deep Recurrent Neural Network for Monaural Singing Voice Separation

Read more about Proximal Deep Recurrent Neural Network for Monaural Singing Voice Separation
Log in to post comments

The recent deep learning methods can offer state-of-the-art performance for Monaural Singing Voice Separation (MSVS). In these deep methods, the recurrent neural network (RNN) is widely employed. This work proposes a novel type of Deep RNN (DRNN), namely Proximal DRNN (P-DRNN) for MSVS, which improves the conventional Stacked RNN (S-RNN) by introducing a novel interlayer structure. The interlayer structure is derived from an optimization problem for Monaural Source Separation (MSS).

conference_poster_5.pdf

poster_ICASSP19005 (357)

Categories:: Music Signal Processing

4 Views

A Unified Neural Architecture for Instrumental Audio Tasks

Read more about A Unified Neural Architecture for Instrumental Audio Tasks
Log in to post comments

ICASSP2019_POSTER_SPRATLEY.pdf

ICASSP2019_POSTER_SPRATLEY.pdf (270)

Categories:: Music Signal Processing

15 Views

MULTITASK LEARNING FOR FRAME-LEVEL INSTRUMENT RECOGNITION

Read more about MULTITASK LEARNING FOR FRAME-LEVEL INSTRUMENT RECOGNITION
Log in to post comments

For many music analysis problems, we need to know the presence
of instruments for each time frame in a multi-instrument
musical piece. However, such a frame-level instrument recognition
task remains difficult, mainly due to the lack of labeled
datasets. To address this issue, we present in this paper a
large-scale dataset that contains synthetic polyphonic music
with frame-level pitch and instrument labels. Moreover, we
propose a simple yet novel network architecture to jointly predict

multitask learning-icassp2019-poster.pdf

multitask learning-icassp2019-poster.pdf (433)

Categories:: Music Signal Processing

18 Views

PHASE-AWARE HARMONIC/PERCUSSIVE SOURCE SEPARATION VIA CONVEX OPTIMIZATION

Read more about PHASE-AWARE HARMONIC/PERCUSSIVE SOURCE SEPARATION VIA CONVEX OPTIMIZATION
Log in to post comments

In this paper, a phase-aware HPSS method through convex optimization was proposed. Based on two HPSS approaches (anisotropic smoothness and sinusoidal model), the proposed method assumes the smoothness of the complex-valued spectrogram of harmonic components calculated by iPC-STFT in the time direction. On the other hand, the time-frame-wise sparsity of percussive spectrograms was considered as a phase insensitive prior. Furthermore, the proposed method considers the perfect reconstruction constraint in the time domain instead of power spectrograms.

icassp_hpss_ver0.4.pdf

icassp_hpss_ver0.4.pdf (394)

Categories:: Music Signal Processing

85 Views

LOW-RANKNESS OF COMPLEX-VALUED SPECTROGRAM AND ITS APPLICATION TO PHASE-AWARE AUDIO PROCESSING

Low-rankness of amplitude spectrograms has been effectively utilized in audio signal processing methods including non-negative matrix factorization. However, such methods have a fundamental limitation owing to their amplitude-only treatment where the phase of the observed signal is utilized for resynthesizing the estimated signal. In order to address this limitation, we directly treat a complex-valued spectrogram and show a complex-valued spectrogram of a sum of sinusoids can be approximately low-rank by modifying its phase.

icassp_ipclr_ver0.4.pdf

icassp_ipclr_ver0.4.pdf (448)

Categories:: Music Signal Processing

91 Views

A HYBRID NEURAL NETWORK BASED ON THE DUPLEX MODEL OF PITCH PERCEPTION FOR SINGING MELODY EXTRACTION

In this paper, we build up a hybrid neural network (NN) for singing melody extraction from polyphonic music by imitating human pitch perception. For human hearing, there are two pitch perception models, the spectral model and the temporal model, in accordance with whether harmonics are resolved or not. Here, we first use NNs to implement individual models and evaluate their performance in the task of singing melody extraction.

A HYBRID NEURAL NETWORK BASED ON THE DUPLEX MODEL OF PITCH PERCEPTION FOR SINGING MELODY EXTRACTION.pdf

A HYBRID NEURAL NETWORK BASED ON THE DUPLEX MODEL OF PITCH PERCEPTION FOR SINGING MELODY EXTRACTION.pdf (369)

Categories:: Music Signal Processing

28 Views

MUSIC CHORD RECOGNITION BASED ON MIDI-TRAINED DEEP FEATURE AND BLSTM-CRF HYBIRD DECODING

In this paper, we design a novel deep learning based hybrid system for automatic chord recognition. Currently, there is a bottleneck in the amount of enough annotated data for training robust acoustic models, as hand annotating time-synchronized chord labels requires professional musical skills and considerable labor. As a solution to this problem, we construct a large set of time synchronized MIDI-audio pairs, and use these data to train a Deep Residual Network (DRN) feature extractor, which can then estimate pitch class activations of real-world music audio recordings.

ICASSP2018Poster_WuYiming.pdf

ICASSP2018Poster_WuYiming.pdf (840)

Categories:: Music Signal Processing

90 Views

CREPE: A Convolutional Representation for Pitch Estimation

Read more about CREPE: A Convolutional Representation for Pitch Estimation
Log in to post comments

The task of estimating the fundamental frequency of a monophonic sound recording, also known as pitch tracking, is fundamental to audio processing with multiple applications in speech processing and music information retrieval. To date, the best performing techniques, such as the pYIN algorithm, are based on a combination of DSP pipelines and heuristics. While such techniques perform very well on average, there remain many cases in which they fail to correctly estimate the pitch.

crepe.pdf

crepe.pdf (479)

Categories:: Music Signal Processing

30 Views

SVSGAN: SINGING VOICE SEPARATION VIA GENERATIVE ADVERSARIAL NETWORK

Read more about SVSGAN: SINGING VOICE SEPARATION VIA GENERATIVE ADVERSARIAL NETWORK
Log in to post comments

fan18icassp_poster.pdf

fan18icassp_poster.pdf (437)

Categories:: Music Signal Processing

32 Views

Vocal melody extraction using patch-based CNN

Read more about Vocal melody extraction using patch-based CNN
Log in to post comments

A patch-based convolutional neural network (CNN) model presented in this paper for vocal melody extraction in polyphonic music is inspired from object detection in image processing. The input of the model is a novel time-frequency representation which enhances the pitch contours and suppresses the harmonic components of a signal. This succinct data representation and the patch-based CNN model enable an efficient training process with limited labeled data. Experiments on various datasets show excellent speed and competitive accuracy comparing to other deep learning approaches.

poster_icassp_v2.pdf

poster_icassp_v2.pdf (440)

Categories:: Music Signal Processing

6 Views

Music Signal Processing

Pages