Speech Enhancement (SPE-ENHA)

Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN for non-parallel speech enhancement

ICASSP2022_CINCGAN_slides.pdf

ICASSP2022_CINCGAN_slides.pdf (215)

Categories:: Speech Enhancement (SPE-ENHA)

29 Views

Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement

Read more about Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement
Log in to post comments

ICASSP2022_DBAIAT_slides.pdf

ICASSP2022_DBAIAT_slides.pdf (390)

Categories:: Speech Enhancement (SPE-ENHA)

26 Views

DPT-FSNET: DUAL-PATH TRANSFORMER BASED FULL-BAND AND SUB-BAND FUSION NETWORK FOR SPEECH ENHANCEMENT

Poster_2333.pdf

Poster_2333.pdf (224)

Categories:: Speech Enhancement (SPE-ENHA)

23 Views

DeepFilterNet: A Low Complexity Speech Enhancement Framework for Full-Band Audio based on Deep Filtering

Complex-valued processing has brought deep learning-based speech enhancement and signal extraction to a new level. Typically, the process is based on a time-frequency (TF) mask which is applied to a noisy spectrogram, while complex masks (CM) are usually preferred over real-valued masks due to their ability to modify the phase. Recent work proposed to use a complex filter instead of a point-wise multiplication with a mask.

2022_icassp_presentation.pdf

2022_icassp_presentation.pdf (264)

Categories:: Speech Enhancement (SPE-ENHA)

125 Views

Bandwidth Extension is All You Need

Read more about Bandwidth Extension is All You Need
Log in to post comments

Speech generation and enhancement have seen recent breakthroughs in quality thanks to deep learning. These methods typically operate at a limited sampling rate of 16-22kHz due to computational complexity and available datasets. This limitation imposes a gap between the output of such methods and that of high-fidelity (≥44kHz) real-world audio applications. This paper proposes a new bandwidth extension (BWE) method that expands 8-16kHz speech signals to 48kHz. The method is based on a feed-forward WaveNet architecture trained with a GAN-based deep feature loss.

ICASSP2021_BWE_poster.pdf

Poster (384)

ICASSP2021_BWE_slides.pdf

Slides (541)

Categories:: Source Separation and Signal Enhancement
Speech Enhancement (SPE-ENHA)

133 Views

Towards an ASR Approach Using Acoustic and Language Models for Speech Enhancement

Read more about Towards an ASR Approach Using Acoustic and Language Models for Speech Enhancement
Log in to post comments

Recent work has shown that deep-learning based speech enhancement performs best when a time-frequency mask is estimated. Unlike speech, these masks have a small range of values that better facilitate regression-based learning. The question remains whether neural-network based speech estimation should be treated as a regression problem. In this work, we propose to modify the speech estimation process, by treating speech enhancement as a classification problem in an ASR-style manner.

nayem_ICASSP2021_final.pdf

Presentation slides of ASR style quantized spectral model based SE, presented at ICASSP 2021. (303)

Categories:: Speech Enhancement (SPE-ENHA)

46 Views

ICASSP2021_slides

20210410_ICASSP2021.pptx

20210410_ICASSP2021.pptx (260)

Categories:: Speech Enhancement (SPE-ENHA)

11 Views

ADL-MVDR: All deep learning MVDR beamformer for target speech separation

Read more about ADL-MVDR: All deep learning MVDR beamformer for target speech separation
Log in to post comments

Speech separation algorithms are often used to separate the target speech from other interfering sources. However, purely neural network based speech separation systems often cause nonlinear distortion that is harmful for automatic speech recognition (ASR) systems. The conventional mask-based minimum variance distortionless response (MVDR) beamformer can be used to minimize the distortion, but comes with high level of residual noise.

ICASSP Poster 1240.pdf

Poster (376)

ICASSP slides 1240.pdf

Slides (525)

Categories:: Source Separation and Signal Enhancement
Speech Enhancement (SPE-ENHA)

27 Views

Multi-Channel Target Speech Extraction with Channel Decorrelation and Target Speaker Adaptation

The end-to-end approaches for single-channel target speech extraction have attracted widespread attention. However, the studies for end-to-end multi-channel target speech extraction are still relatively limited. In this work, we propose two methods for exploiting the multi-channel spatial information to extract the target speech. The first one is using a target speech adaptation layer in a parallel encoder architecture. The second one is designing a channel decorrelation mechanism to extract the inter-channel differential information to enhance the multi-channel encoder representation.

poster-revised.pdf

Poster of presentation (282)

Categories:: Speech Enhancement (SPE-ENHA)

21 Views

Audio-Visual Speech Inpainting with Deep Learning

Read more about Audio-Visual Speech Inpainting with Deep Learning
Log in to post comments

In this paper, we present a deep-learning-based framework for audio-visual speech inpainting, i.e. the task of restoring the missing parts of an acoustic speech signal from reliable audio context and uncorrupted visual information. Recent work focuses solely on audio-only methods and they generally aim at inpainting music signals, which show highly different structure than speech. Instead, we inpaint speech signals with gaps ranging from 100 ms to 1600 ms to investigate the contribution that vision can provide for gaps of different duration.

Paper#1217_Presentation.pdf

Paper#1217_Presentation.pdf (389)

Categories:: Speech Enhancement (SPE-ENHA)

13 Views