Sorry, you need to enable JavaScript to visit this website.

Deep Clustering (DC) and Deep Attractor Networks (DANs) are a data-driven way to monaural blind source separation.
Both approaches provide astonishing single channel performance but have not yet been generalized to block-online processing.
When separating speech in a continuous stream with a block-online algorithm, it needs to be determined in each block which of the output streams belongs to whom.
In this contribution we solve this block permutation problem by introducing an additional speaker identification embedding to the DAN model structure.

Categories:
25 Views

Robust speech processing in multi-talker environments requires effective speech separation. Recent deep learning systems have made significant progress toward solving this problem, yet it remains challenging particularly in real-time, short latency applications. Most methods attempt to construct a mask for each source in time-frequency representation of the mixture signal which is not necessarily an optimal representation for speech separation.

Categories:
67 Views

The state of the art in music source separation employs neural networks trained in a supervised fashion on multi-track databases to estimate the sources from a given mixture. With only few datasets available, often extensive data augmentation is used to combat overfitting. Mixing random tracks, however, can even reduce separation performance as instruments in real music are strongly correlated. The key concept in our approach is that source estimates of an optimal separator should be indistinguishable from real source signals.

Categories:
14 Views

In this article, we propose a Bounded Component Analysis (BCA) approach for the separation of the convolutive mixtures of sparse sources. The corresponding algorithm is derived from a geometric objective function defined over a completely deterministic setting. Therefore, it is applicable to sources which can be independent or dependent in both space and time dimensions. We show that all global optima of the proposed objective are perfect separators. We also provide numerical examples to illustrate the performance of the algorithm.

Categories:
22 Views

We propose a new sparse coding technique based on the power mean of phase-invariant cosine distances. Our approach is a generalization of sparse filtering and K-hyperlines clustering. It offers a better sparsity enforcer than the L1/L2 norm ratio that is typically used in sparse filtering. At the same time, the proposed approach scales better than the clustering counter parts for high-dimensional input. Our algorithm fully exploits the prior information obtained by preprocessing the observed data with whitening via an efficient row-wise decoupling scheme.

Categories:
6 Views

Sound source separation at low-latency requires that each in- coming frame of audio data be processed at very low de- lay, and outputted as soon as possible. For practical pur- poses involving human listeners, a 20 ms algorithmic delay is the uppermost limit which is comfortable to the listener. In this paper, we propose a low-latency (algorithmic delay ≤ 20 ms) deep neural network (DNN) based source sepa- ration method.

Categories:
18 Views

Coupled decompositions of multiple tensors are fundamental tools for multi-set data fusion. In this paper, we introduce a coupled version of the rank-(Lm, Ln, ·) block term decomposition (BTD), applicable to joint independent
subspace analysis. We propose two algorithms for its computation based on a coupled block simultaneous generalized Schur decomposition scheme. Numerical results are given to show the performance of the proposed algorithms.

Categories:
12 Views

Pages