Sorry, you need to enable JavaScript to visit this website.

In recent years, nonnegative matrix factorization (NMF) with volume regularization has been shown to be a powerful identifiable model; for example for hyperspectral unmixing, document classification, community detection and hidden Markov models. We show that minimum-volume NMF (min-vol NMF) can also be used when the basis matrix is rank deficient, which is a reasonable scenario for some real-world NMF problems (e.g., for unmixing multispectral images).

Categories:
26 Views

The SpeakerBeam-FE (SBF) method is proposed for speaker extraction. It attempts to overcome the problem of unknown number of speakers in an audio recording during source separation. The mask approximation loss of SBF is sub-optimal, which doesn’t calculate direct signal reconstruction error and consider the speech context. To address these problems, this paper proposes a magnitude and temporal spectrum approximation loss to estimate a phase sensitive mask for the target speaker with the speaker characteristics.

Categories:
12 Views

The recent deep learning methods can offer state-of-the-art performance for Monaural Singing Voice Separation (MSVS). In these deep methods, the recurrent neural network (RNN) is widely employed. This work proposes a novel type of Deep RNN (DRNN), namely Proximal DRNN (P-DRNN) for MSVS, which improves the conventional Stacked RNN (S-RNN) by introducing a novel interlayer structure. The interlayer structure is derived from an optimization problem for Monaural Source Separation (MSS).

Categories:
7 Views

We present a monophonic source separation system that is trained by only observing mixtures with no ground truth separation information. We use a deep clustering approach which trains on multi-channel mixtures and learns to project spectrogram bins to source clusters that correlate with various spatial features. We show that using such a training process we can obtain separation performance that is as good as making use of ground truth separation information.

Categories:
68 Views

Most of the determined blind source separation (BSS) algorithms related to the independent component analysis (ICA) were derived from mathematical models of source signals. However, such derivation restricts the application of algorithms to explicitly definable source models, i.e., an implicit model associated with some signal-processing procedure cannot be utilized within such framework.

Categories:
120 Views

This paper proposes a low algorithmic latency adaptation of the deep clustering approach to speaker-independent speech separation. It consists of three parts: a) the usage of long-short-term-memory (LSTM) networks instead of their bidirectional variant used in the original work, b) using a short synthesis window (here 8 ms) required for low-latency operation, and, c) using a buffer in the beginning of audio mixture to estimate cluster centres corresponding to constituent speakers which are then utilized to separate speakers within the rest of the signal.

Categories:
65 Views

An adaptive sub-band differential microphone array beamformer is proposed in order to achieve a distortionless response at arbitrary target directions, using two closely-spaced microphones in a hearing aid. Different variations are introduced to have a distortionless response in the target speaker direction. Two of these variations assume a free field environment when designing the beamformer, while two other designs consider the head shadow effect by using Head-Related Transfer Functions (HRTFs), for hearing aid applications.

Categories:
23 Views

In this paper, we present an algorithm called Reliable Mask Selection-Phase Difference Channel Weighting (RMS-PDCW) which selects the target source masked by a noise source using the Angle of Arrival (AoA) information calculated using the phase difference information. The RMS-PDCW algorithm selects masks to apply using the information about the localized sound source and the onset detection of speech.

Categories:
9 Views

Pages