Speech Enhancement (SPE-ENHA)

SICRN: Advancing Speech Enhancement through State Space Model and Inplace Convolution Techniques

Speech enhancement aims to improve speech quality and intelligibility, especially in noisy environments where background noise degrades speech signals. Currently, deep learning methods achieve great success in speech enhancement, e.g. the representative convolutional recurrent neural network (CRN) and its variants. However, CRN typically employs consecutive downsampling and upsampling convolution for frequency modeling, which destroys the inherent structure of the signal over frequency. Additionally, convolutional layers lacks of temporal modelling abilities.

ICASSP_2024_SICRN_poster.pdf

ICASSP_2024_SICRN_poster.pdf (99)

Categories:: Speech Enhancement (SPE-ENHA)

23 Views

DUAL-PATH MINIMUM-PHASE AND ALL-PASS DECOMPOSITION NETWORK FOR SINGLE CHANNEL SPEECH DEREVERBERATION

Introduction of DUAL-PATH MINIMUM-PHASE AND ALL-PASS DECOMPOSITION NETWORK FOR SINGLE CHANNEL SPEECH DEREVERBERATION.

2024-ICASSP24-XiLiu_Poster-TEMPLATE-Rev1A-April10-2024.pptx

2024-ICASSP24-XiLiu_Poster-TEMPLATE-Rev1A-April10-2024.pptx (162)

Categories:: Speech Enhancement (SPE-ENHA)
Room Acoustics and Acoustic System Modeling

11 Views

Real-time perceptually motivated neural network for echo control and noise reduction

Read more about Real-time perceptually motivated neural network for echo control and noise reduction
Log in to post comments

Echo and background noise are the major obstacles in today’s user sound experience for devices like a speakerphone or video bar. We propose real-time perceptually motivated neural network-based echo control and noise reduction. The demonstrated method relies on a linear acoustic echo canceller (LAEC) combined with a neural network as a post-filter which incorporates perceptual mapping in both feature representation and loss function. The proposed method relies on mic and far-end signals for the LAEC stage, while the LAEC output, mic and echo estimate are inputs to the post-filter.

Poster Real-Time Perceptually Motivated Neural Network for Deep Echo Suppression ICASSP 2023 landscape.pdf

Poster Real-Time Perceptually Motivated Neural Network for Deep Echo Suppression ICASSP 2023 landscape.pdf (267)

Categories:: Speech Enhancement (SPE-ENHA)

129 Views

Optimize for my Voice with Speaker Identification

Read more about Optimize for my Voice with Speaker Identification
Log in to post comments

The proposed system enhances speech in video-conferencing applications. We aim to improve speech quality and communication clarity in various daily-life scenarios. Our demo will appeal to the ICASSP audience because it is related to the 5th DNS Challenge. The demo aims to enhance audio signal to preserve the primary talker while suppressing neighboring talkers, noise, and reverberation. Besides these challenges, the system automatically controls the level of the primary talker and doesn’t boost return echos or misdetections of noise as speech.

ICASSP2023_final6.pdf

Demo poster ICASSP 2023 (156)

Categories:: Speech Enhancement (SPE-ENHA)

102 Views

LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders

Read more about LA-VocE: Low-SNR Audio-visual Speech Enhancement using Neural Vocoders
Log in to post comments

Audio-visual speech enhancement aims to extract clean speech from a noisy environment by leveraging not only the audio itself but also the target speaker's lip movements. This approach has been shown to yield improvements over audio-only speech enhancement, particularly for the removal of interfering speech. Despite recent advances in speech synthesis, most audio-visual approaches continue to use spectral mapping/masking to reproduce the clean audio, often resulting in visual backbones added to existing speech enhancement architectures.

paper.pdf

Link to paper (149)

poster.pdf

Link to poster (138)

slides.pdf

Link to slides (171)

Categories:: Speech Enhancement (SPE-ENHA)

37 Views

Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge

This paper describes our submission to the L3DAS22 Challenge Task 1, which consists of speech enhancement with 3D Ambisonic microphones. The core of our approach combines Deep Neural Network (DNN) driven complex spectral mapping with linear beamformers such as the multi-frame multi-channel Wiener filter. Our proposed system has two DNNs and a linear beamformer in between. Both DNNs are trained to perform complex spectral mapping, using a combination of waveform and magnitude spectrum losses.

iNeuBe_ Towards Low-distortion Multi-channel Speech Enhancement.pdf

Presentation Slides (181)

Categories:: Speech Enhancement (SPE-ENHA)

24 Views

Conditional Diffusion Probabilistic Model for Speech Enhancement

Read more about Conditional Diffusion Probabilistic Model for Speech Enhancement
Log in to post comments

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are still lagging behind in speech enhancement. This work leverages recent advances in diffusion probabilistic models, and proposes a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes.

Conditional Diffusion Probabilistic Model for Speech Enhancement.pdf

Presentation Slides (530)

Categories:: Speech Enhancement (SPE-ENHA)

48 Views

Speech enhancement with neural homomorphic synthesis

Read more about Speech enhancement with neural homomorphic synthesis
Log in to post comments

Most deep learning-based speech enhancement methods operate directly on time-frequency representations or learned features without making use of the model of speech production. This work proposes a new speech enhancement method based on neural homomorphic synthesis. The speech signal is firstly decomposed into excitation and vocal tract with complex cepstrum analysis. Then, two complex-valued neural networks are applied to estimate the target complex spectrum of the decomposed components. Finally, the time-domain speech signal is synthesized from the estimated excitation and vocal tract.

Speech enhancement with neural homomorphic synthesis.pdf

Speech enhancement with neural homomorphic synthesis.pdf (389)

Categories:: Speech Enhancement (SPE-ENHA)

45 Views

HARMONICITY PLAYS A CRITICAL ROLE IN DNN BASED VERSUS IN BIOLOGICALLY-INSPIRED MONAURAL SPEECH SEGREGATION SYSTEMS

Recent advancements in deep learning have led to drastic improvements in speech segregation models. Despite their success and growing applicability, few efforts have been made to analyze the underlying principles that these networks learn to perform segregation. Here we analyze the role of harmonicity on two state-of-the-art Deep Neural Networks (DNN)-based models- Conv-TasNet and DPT-Net. We evaluate their performance with mixtures of natural speech versus slightly manipulated inharmonic speech, where harmonics are slightly frequency jittered.

ICASSP_Harmonicity.pdf

Slide Deck (234)

Parikh_poster.pdf

Poster (222)

Parikh_CR.pdf

Manuscript (222)

Categories:: Source Separation and Signal Enhancement
Source separation (MLR-SSEP)
Speech Enhancement (SPE-ENHA)

9 Views

Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

Speech enhancement has recently achieved great success with various
deep learning methods. However, most conventional speech enhancement
systems are trained with supervised methods that impose two
significant challenges. First, a majority of training datasets for speech
enhancement systems are synthetic. When mixing clean speech and
noisy corpora to create the synthetic datasets, domain mismatches
occur between synthetic and real-world recordings of noisy speech
or audio. Second, there is a trade-off between increasing speech

Unsupervised_Speech_Enhancement_with_Speech_Recognition_Embedding_and_Disentanglement_Losses.pdf

Unsupervised_Speech_Enhancement_with_Speech_Recognition_Embedding_and_Disentanglement_Losses.pdf (193)

Categories:: Speech Enhancement (SPE-ENHA)

11 Views

Speech Enhancement (SPE-ENHA)

Pages