
- Read more about Low-latency deep clustering for speech separation
- Log in to post comments
This paper proposes a low algorithmic latency adaptation of the deep clustering approach to speaker-independent speech separation. It consists of three parts: a) the usage of long-short-term-memory (LSTM) networks instead of their bidirectional variant used in the original work, b) using a short synthesis window (here 8 ms) required for low-latency operation, and, c) using a buffer in the beginning of audio mixture to estimate cluster centres corresponding to constituent speakers which are then utilized to separate speakers within the rest of the signal.
- Categories:

- Read more about ADAPTIVE DIFFERENTIAL MICROPHONE ARRAY WITH DISTORTIONLESS RESPONSE AT ARBITRARY DIRECTIONS FOR HEARING AID APPLICATIONS
- Log in to post comments
An adaptive sub-band differential microphone array beamformer is proposed in order to achieve a distortionless response at arbitrary target directions, using two closely-spaced microphones in a hearing aid. Different variations are introduced to have a distortionless response in the target speaker direction. Two of these variations assume a free field environment when designing the beamformer, while two other designs consider the head shadow effect by using Head-Related Transfer Functions (HRTFs), for hearing aid applications.
Poster_HalaAs'ad.pdf

- Categories:

- Read more about SOUND SOURCE SEPARATION USING PHASE DIFFERENCE AND RELIABLE MASK SELECTION SELECTION
- Log in to post comments
In this paper, we present an algorithm called Reliable Mask Selection-Phase Difference Channel Weighting (RMS-PDCW) which selects the target source masked by a noise source using the Angle of Arrival (AoA) information calculated using the phase difference information. The RMS-PDCW algorithm selects masks to apply using the information about the localized sound source and the onset detection of speech.
- Categories:

- Read more about Joint Separation and Dereverberation of Reverberant Mixtures with Determined Multichannel Non-negative Matrix Factorization
- Log in to post comments
This paper proposes an extension of multichannel non-negative matrix factorization (MNMF) that simultaneously solves source separation and dereverberation. While MNMF was originally formulated under an underdetermined problem setting where sources outnumber microphones, a determined counterpart of MNMF, which we call the determined MNMF (DMNMF), has recently been proposed with notable success.
- Categories:

- Read more about CBLDNN-BASED SPEAKER-INDEPENDENT SPEECH SEPARATION VIA GENERATIVE ADVERSARIAL TRAINING
- Log in to post comments
In this paper, we propose a speaker-independent multi-speaker monaural speech separation system (CBLDNN-GAT) based on convolutional, bidirectional long short-term memory, deep feed-forward neural network (CBLDNN) with generative adversarial training (GAT). Our system aims at obtaining better speech quality instead of only minimizing a mean square error (MSE). In the initial phase, we utilize log-mel filterbank and pitch features to warm up our CBLDNN in a multi-task manner.
- Categories:

It is commonly believed that multipath hurts various audio processing algorithms. At odds with this belief, we show that multipath in fact helps sound source separation, even with very simple propagation models. Unlike most existing methods, we neither ignore the room impulse responses, nor we attempt to estimate them fully. We rather assume to know the positions of a few virtual microphones generated by echoes and we show how this gives us enough spatial diversity to get a performance boost over the anechoic case.
- Categories:

- Read more about ADAPTIVE CODING OF NON-NEGATIVE FACTORIZATION PARAMETERS WITH APPLICATION TO INFORMED SOURCE SEPARATION
- Log in to post comments
Informed source separation (ISS) uses source separation for extracting audio objects out of their downmix given some pre-computed parameters. In recent years, non-negative tensor factorization (NTF) has proven to be a good choice for compressing audio objects at an encoding stage. At the decoding stage, these parameters are used to separate the downmix with Wiener-filtering. The quantized NTF parameters have to be encoded to a bitstream prior to transmission.
- Categories:

- Read more about Shift-Invariant Kernel Additive Modelling for Audio Source Separation
- 1 comment
- Log in to post comments
A major goal in blind source separation to identify and separate sources is to model their inherent characteristics. While most state-of- the-art approaches are supervised methods trained on large datasets, interest in non-data-driven approaches such as Kernel Additive Modelling (KAM) remains high due to their interpretability and adaptability. KAM performs the separation of a given source applying robust statistics on the time-frequency bins selected by a source-specific kernel function, commonly the K-NN function.
dfy_poster.pdf

- Categories:

- Read more about END-TO-END SOUND SOURCE ENHANCEMENT USING DEEP NEURAL NETWORK IN THE MODIFIED DISCRETE COSINE TRANSFORM DOMAIN
- Log in to post comments
- Categories:

- Read more about ON SPEECH ENHANCEMENT USING MICROPHONE ARRAYS IN THE PRESENCE OF CO-DIRECTIONAL INTERFERENCE
- Log in to post comments
- Categories: