Source Separation and Signal Enhancement

Flow-Based Fast Multichannel Nonnegative Matrix Factorization for Blind Source Separation

This paper describes a blind source separation method for multichannel audio signals, called NF-FastMNMF, based on the integration of the normalizing flow (NF) into the multichannel nonnegative matrix factorization with jointly-diagonalizable spatial covariance matrices, a.k.a. FastMNMF.

_ICASSP_22POSTERNF_FastMNMF.pdf

_ICASSP_22__POSTER__NF_FastMNMF.pdf (318)

Categories:: Source Separation and Signal Enhancement

113 Views

SKIPPING MEMORY LSTM FOR LOW-LATENCY REAL-TIME CONTINUOUS SPEECH SEPARATION

Read more about SKIPPING MEMORY LSTM FOR LOW-LATENCY REAL-TIME CONTINUOUS SPEECH SEPARATION
Log in to post comments

Continuous speech separation for meeting pre-processing has recently become a focused research topic. Compared to the data in utterance-level speech separation, the meeting-style audio stream lasts longer, has an uncertain number of speakers. We adopt the time-domain speech separation method and the recently proposed Graph-PIT to build a super low-latency online speech separation model, which is very important for the real application. The low-latency time-domain encoder with a small stride leads to an extremely long feature sequence.

poster.pdf

poster.pdf (248)

Categories:: Source Separation and Signal Enhancement

25 Views

Multi-frame Full-rank Spatial Covariance Analysis for Underdetermined BSS in Reverberant Environment

Full-rank spatial covariance analysis (FCA) is a blind source separation (BSS) method, and can be applied to underdetermined cases where the sources outnumber the microphones. This paper proposes a new extension of FCA, aiming to improve BSS performance for mixtures in which the length of reverberation exceeds the analysis frame. There has already been proposed a model that considers delayed source components as the exceeded parts. In contrast, our new extension models multiple time frames with multivariate Gaussian distributions of larger dimensionality than the existing FCA models.

icassp2022AUD16-1.pdf

icassp2022AUD16-1.pdf (368)

slide.pdf

slide.pdf (251)

Categories:: Source Separation and Signal Enhancement

92 Views

On loss functions and evaluation metrics for music source separation

Read more about On loss functions and evaluation metrics for music source separation
Log in to post comments

We investigate which loss functions provide better separations via
benchmarking an extensive set of those for music source separation.
To that end, we first survey the most representative audio source
separation losses we identified, to later consistently benchmark them
in a controlled experimental setup. We also explore using such losses
as evaluation metrics, via cross-correlating them with the results of
a subjective test. Based on the observation that the standard signal-
to-distortion ratio metric can be misleading in some scenarios, we

ICASSP22 POSTER.pdf

ICASSP22 POSTER.pdf (275)

Categories:: Source Separation and Signal Enhancement

95 Views

MINING HARD SAMPLES LOCALLY AND GLOBALLY FOR IMPROVED SPEECH SEPARATION

Read more about MINING HARD SAMPLES LOCALLY AND GLOBALLY FOR IMPROVED SPEECH SEPARATION
Log in to post comments

MINING_HARD_SAMPLES_LOCALLY_AND_GLOBALLY_FOR__IMPROVED_SPEECH_SEPARATION0212.pdf

ICASSP2022-speech separation (295)

Categories:: Source Separation and Signal Enhancement

19 Views

HARVESTING PARTIALLY-DISJOINT TIME-FREQUENCY INFORMATION FOR IMPROVING DEGENERATE UNMIXING ESTIMATION TECHNIQUE

The degenerate unmixing estimation technique (DUET) is one of the most efficient blind source separation algorithms tackling the challenging situation when the number of sources exceeds the number of microphones. However, as a time-frequency mask-based method, DUET erroneously results in interference components retention when source signals overlap each other in both frequency and time domains.

slides.pdf

Presentation slide (283)

Categories:: Source Separation and Signal Enhancement

19 Views

Uformer: A Unet Based Dilated Complex & Real Dual-path Conformer Network for Simultaneous Speech Enhancement and Dereverberation

uformer_poster.pdf

uformer_poster.pdf (330)

Categories:: Source Separation and Signal Enhancement

23 Views

HARMONICITY PLAYS A CRITICAL ROLE IN DNN BASED VERSUS IN BIOLOGICALLY-INSPIRED MONAURAL SPEECH SEGREGATION SYSTEMS

Recent advancements in deep learning have led to drastic improvements in speech segregation models. Despite their success and growing applicability, few efforts have been made to analyze the underlying principles that these networks learn to perform segregation. Here we analyze the role of harmonicity on two state-of-the-art Deep Neural Networks (DNN)-based models- Conv-TasNet and DPT-Net. We evaluate their performance with mixtures of natural speech versus slightly manipulated inharmonic speech, where harmonics are slightly frequency jittered.

ICASSP_Harmonicity.pdf

Slide Deck (282)

Parikh_poster.pdf

Poster (286)

Parikh_CR.pdf

Manuscript (281)

Categories:: Source Separation and Signal Enhancement
Source separation (MLR-SSEP)
Speech Enhancement (SPE-ENHA)

14 Views

Ray-Space-Based Multichannel Nonnegative Matrix Factorization for Audio Source Separation

Nonnegative matrix factorization (NMF) has been traditionally considered a promising approach for audio source separation. While standard NMF is only suited for single-channel mixtures, extensions to consider multi-channel data have been also proposed. Among the most popular alternatives, multichannel NMF (MNMF) and further derivations based on constrained spatial covariance models have been successfully employed to separate multi-microphone convolutive mixtures.

ICASSP_2022_rsmnmf.pdf

ICASSP_2022_rsmnmf.pdf (212)

Categories:: Source Separation and Signal Enhancement

11 Views

TIME-DOMAIN AUDIO-VISUAL SPEECH SEPARATION ON LOW QUALITY VIDEOS

Read more about TIME-DOMAIN AUDIO-VISUAL SPEECH SEPARATION ON LOW QUALITY VIDEOS
Log in to post comments

Incorporating visual information is a promising approach to improve the performance of speech separation. Many related works have been conducted and provide inspiring results. However, low quality videos appear commonly in real scenarios, which may significantly degrade the performance of normal audio-visual speech separation system. In this paper, we propose a new structure to fuse the audio and visual features, which uses the audio feature to select relevant visual features by utilizing the attention mechanism.

poster.pdf

Poster (236)

presentation.pptx

Slides (214)

Categories:: Source Separation and Signal Enhancement
Multimodal signal processing

22 Views

Source Separation and Signal Enhancement

Pages