Source Separation and Signal Enhancement

Bandwidth Extension is All You Need

Read more about Bandwidth Extension is All You Need
Log in to post comments

Speech generation and enhancement have seen recent breakthroughs in quality thanks to deep learning. These methods typically operate at a limited sampling rate of 16-22kHz due to computational complexity and available datasets. This limitation imposes a gap between the output of such methods and that of high-fidelity (≥44kHz) real-world audio applications. This paper proposes a new bandwidth extension (BWE) method that expands 8-16kHz speech signals to 48kHz. The method is based on a feed-forward WaveNet architecture trained with a GAN-based deep feature loss.

ICASSP2021_BWE_poster.pdf

Poster (299)

ICASSP2021_BWE_slides.pdf

Slides (299)

Categories:: Source Separation and Signal Enhancement
Speech Enhancement (SPE-ENHA)

130 Views

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer

EETransformer_poster.pdf

EETransformer_poster.pdf (212)

Categories:: Source Separation and Signal Enhancement

140 Views

Don't shoot butterfly with rifles: Multi-channel Continuous Speech Separation with Early Exit Transformer

EETransformer_presentation-v1.pdf

EETransformer_presentation-v1.pdf (233)

Categories:: Source Separation and Signal Enhancement

150 Views

Continuous Speech Separation with Conformer

Read more about Continuous Speech Separation with Conformer
Log in to post comments

Conformer_presentation_v1.pdf

Conformer_presentation_v1.pdf (264)

Categories:: Source Separation and Signal Enhancement

151 Views

Enhancement of Ambisonics Signals using time-frequency masking

Read more about Enhancement of Ambisonics Signals using time-frequency masking
Log in to post comments

slides.pdf

slides.pdf (213)

Categories:: Source Separation and Signal Enhancement

9 Views

COMPLEX RATIO MASKING FOR SINGING VOICE SEPARATION

Read more about COMPLEX RATIO MASKING FOR SINGING VOICE SEPARATION
Log in to post comments

Music source separation is important for applications such as karaoke and remixing. Much of previous research
focuses on estimating magnitude short-time Fourier transform (STFT) and discarding phase information. We observe that,
for singing voice separation, phase has the potential to make considerable improvement in separation quality. This paper
proposes a complex-domain deep learning method for voice and accompaniment separation. The proposed method employs

2587.pdf

Poster (221)

slides_2587.pptx

Presentation slides (208)

Categories:: Source Separation and Signal Enhancement

21 Views

ADL-MVDR: All deep learning MVDR beamformer for target speech separation

Read more about ADL-MVDR: All deep learning MVDR beamformer for target speech separation
Log in to post comments

Speech separation algorithms are often used to separate the target speech from other interfering sources. However, purely neural network based speech separation systems often cause nonlinear distortion that is harmful for automatic speech recognition (ASR) systems. The conventional mask-based minimum variance distortionless response (MVDR) beamformer can be used to minimize the distortion, but comes with high level of residual noise.

ICASSP Poster 1240.pdf

Poster (276)

ICASSP slides 1240.pdf

Slides (455)

Categories:: Source Separation and Signal Enhancement
Speech Enhancement (SPE-ENHA)

25 Views

Exploiting Non-negative Matrix Factorization for Binaural Sound Localization in the Presence of Directional Interference

This study presents a novel solution to the problem of binaural localization of a speaker in the presence of interfering directional noise and reverberation. Using a state-of-the-art binaural localization algorithm based on a deep neural network (DNN), we propose adding a source separation stage based on non-negative matrix factorization (NMF) to improve the localization performance in conditions with interfering sources.

slides.pdf

Presentation slides (208)

poster.pdf

Conference poster (185)

Categories:: Source Separation and Signal Enhancement

9 Views

Autoregressive Fast Multichannel Nonnegative Matrix Factorization For Joint Blind Source Separation And Dereverberation

This paper describes a joint blind source separation and dereverberation method that works adaptively and efficiently in a reverberant noisy environment. The modern approach to blind source separation (BSS) is to formulate a probabilistic model of multichannel mixture signals that consists of a source model representing the time-frequency structures of source spectrograms and a spatial model representing the inter-channel covariance structures of source images.

ICASSP2021_Poster.pdf

ICASSP2021_Poster.pdf (239)

ICASSP2021_Slide.pdf

ICASSP2021_Slide.pdf (276)

Categories:: Source Separation and Signal Enhancement

9 Views

Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement -- Slides

Read more about Deep Multi-Frame MVDR Filtering for Single-Microphone Speech Enhancement -- Slides
Log in to post comments

Multi-frame algorithms for single-microphone speech enhancement, e.g., the multi-frame minimum variance distortionless response (MFMVDR) filter, are able to exploit speech correlation across adjacent time frames in the short-time Fourier transform (STFT) domain. Provided that accurate estimates of the required speech interframe correlation vector and the noise correlation matrix are available, it has been shown that the MFMVDR filter yields a substantial noise reduction while hardly introducing any speech distortion.

icassp2021_slides.pdf

ICASSP 2021 Slides (250)

Categories:: Source Separation and Signal Enhancement

25 Views

Source Separation and Signal Enhancement

Pages