Source Separation and Signal Enhancement

Proximal Deep Recurrent Neural Network for Monaural Singing Voice Separation

Read more about Proximal Deep Recurrent Neural Network for Monaural Singing Voice Separation
Log in to post comments

The recent deep learning methods can offer state-of-the-art performance for Monaural Singing Voice Separation (MSVS). In these deep methods, the recurrent neural network (RNN) is widely employed. This work proposes a novel type of Deep RNN (DRNN), namely Proximal DRNN (P-DRNN) for MSVS, which improves the conventional Stacked RNN (S-RNN) by introducing a novel interlayer structure. The interlayer structure is derived from an optimization problem for Monaural Source Separation (MSS).

conference_poster_5.pdf

conference_poster_5.pdf (358)

Categories:: Source Separation and Signal Enhancement

8 Views

Unsuper vised Deep Clustering for Source Separation: Direct Learning from Mixtures Using Spatial Information Slides

We present a monophonic source separation system that is trained by only observing mixtures with no ground truth separation information. We use a deep clustering approach which trains on multi-channel mixtures and learns to project spectrogram bins to source clusters that correlate with various spatial features. We show that using such a training process we can obtain separation performance that is as good as making use of ground truth separation information.

unsup_icassp19_slides_new.pdf

unsup_dc_icassp19_slides (393)

Categories:: Source Separation and Signal Enhancement

76 Views

Directional interference suppression using a spatial relative transfer function feature

ICASSP2019_poster_SpatialSuppressor.pdf

ICASSP2019_poster_SpatialSuppressor.pdf (364)

Categories:: Source Separation and Signal Enhancement

10 Views

Single-channel Speech Extraction Using Speaker Inventory and Attention Network

Read more about Single-channel Speech Extraction Using Speaker Inventory and Attention Network
Log in to post comments

ICASSP2019_SpeakerExtractionWithAttentionAndInventory_v21b.pptx

ICASSP2019_SpeakerExtractionWithAttentionAndInventory_v21b.pptx (539)

Categories:: Source Separation and Signal Enhancement

126 Views

Time-frequency-masking-based determined BSS with application to Sparse IVA

Read more about Time-frequency-masking-based determined BSS with application to Sparse IVA
Log in to post comments

Most of the determined blind source separation (BSS) algorithms related to the independent component analysis (ICA) were derived from mathematical models of source signals. However, such derivation restricts the application of algorithms to explicitly definable source models, i.e., an implicit model associated with some signal-processing procedure cannot be utilized within such framework.

yatabeICASSPposter.pdf

yatabeICASSPposter.pdf (453)

Categories:: Source Separation and Signal Enhancement

136 Views

Low-latency deep clustering for speech separation

Read more about Low-latency deep clustering for speech separation
Log in to post comments

This paper proposes a low algorithmic latency adaptation of the deep clustering approach to speaker-independent speech separation. It consists of three parts: a) the usage of long-short-term-memory (LSTM) networks instead of their bidirectional variant used in the original work, b) using a short synthesis window (here 8 ms) required for low-latency operation, and, c) using a buffer in the beginning of audio mixture to estimate cluster centres corresponding to constituent speakers which are then utilized to separate speakers within the rest of the signal.

ICASSP_presentation_updated.pdf

ICASSP_presentation_updated.pdf (406)

Categories:: Source Separation and Signal Enhancement

79 Views

ADAPTIVE DIFFERENTIAL MICROPHONE ARRAY WITH DISTORTIONLESS RESPONSE AT ARBITRARY DIRECTIONS FOR HEARING AID APPLICATIONS

An adaptive sub-band differential microphone array beamformer is proposed in order to achieve a distortionless response at arbitrary target directions, using two closely-spaced microphones in a hearing aid. Different variations are introduced to have a distortionless response in the target speaker direction. Two of these variations assume a free field environment when designing the beamformer, while two other designs consider the head shadow effect by using Head-Related Transfer Functions (HRTFs), for hearing aid applications.

Poster_HalaAs'ad.pdf

Poster (387)

Categories:: Source Separation and Signal Enhancement

28 Views

SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION

Read more about SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION
Log in to post comments

In this paper, we present an algorithm called Reliable Mask Selection-Phase Difference Channel Weighting (RMS-PDCW) which selects the target source masked by a noise source using the Angle of Arrival (AoA) information calculated using the phase difference information. The RMS-PDCW algorithm selects masks to apply using the information about the localized sound source and the onset detection of speech.

icassp_4465_poster.pdf

icassp_4465_poster.pdf (678)

Categories:: Robust Speech Recognition (SPE-ROBU)
Source Separation and Signal Enhancement

17 Views

Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-negative Matrix Factorization

This paper proposes an extension of multichannel non-negative matrix factorization (MNMF) that simultaneously solves source separation and dereverberation. While MNMF was originally formulated under an underdetermined problem setting where sources outnumber microphones, a determined counterpart of MNMF, which we call the determined MNMF (DMNMF), has recently been proposed with notable success.

Kagami2018ICASSP03.pdf

Kagami2018ICASSP03.pdf (560)

Categories:: Source Separation and Signal Enhancement

45 Views

CBLDNN-BASED SPEAKER-INDEPENDENT SPEECH SEPARATION VIA GENERATIVE ADVERSARIAL TRAINING

In this paper, we propose a speaker-independent multi-speaker monaural speech separation system (CBLDNN-GAT) based on convolutional, bidirectional long short-term memory, deep feed-forward neural network (CBLDNN) with generative adversarial training (GAT). Our system aims at obtaining better speech quality instead of only minimizing a mean square error (MSE). In the initial phase, we utilize log-mel filterbank and pitch features to warm up our CBLDNN in a multi-task manner.

conference_poster_4.pdf

conference_poster_4.pdf (601)

Categories:: Source Separation and Signal Enhancement

97 Views

Source Separation and Signal Enhancement

Pages