ICASSP 2023

IEEE ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2023 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement

Read more about DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement
Log in to post comments

In this study, we propose a dense frequency-time attentive network (DeFT-AN) for multichannel speech enhancement. DeFT-AN is a mask estimation network that predicts a complex spectral masking pattern for suppressing the noise and reverberation embedded in the short-time Fourier transform (STFT) of an input signal. The proposed mask estimation network incorporates three different types of blocks for aggregating information in the spatial, spectral, and temporal dimensions.

audio clips.zip

audio clips.zip (178)

Categories:: Source Separation and Signal Enhancement

32 Views

Variational Message Passing-based Respiratory Motion Estimation and Detection Using Radar Signals

We present a variational message passing (VMP)-based approach to detect the presence of a person based on their respiratory chest motion using multistatic ultra-wideband (UWB) radar. In the process, the respiratory motion is estimated for contact-free vital sign monitoring. The received signal is modeled as a backscatter channel and the respiratory motion and propagation channels are estimated using VMP. We use the evidence lower bound (ELBO) to approximate the model evidence for the detection.

ICASSP_poster.pdf

ICASSP2023_poster_moederl (248)

ICASSP2023_presentation.pdf

ICASSP2023_presentation_moederl (234)

Categories:: Sensor Array Processing

42 Views

Transductive Matrix Completion with Calibration for Multi-Task Learning

Read more about Transductive Matrix Completion with Calibration for Multi-Task Learning
Log in to post comments

Multi-task learning has attracted much attention due to growing multi-purpose research with multiple related data sources. Moreover, transduction with matrix completion is a useful method in multi-label learning. In this paper, we propose a transductive matrix completion algorithm that incorporates a calibration constraint for the features under the multi-task learning framework. The proposed algorithm recovers the incomplete feature matrix and target matrix simultaneously. Fortunately, the calibration information improves the completion results.

Matrix_Calibration_Arxiv.pdf

Arxiv version with proof (218)

Transductive_Matrix_Completion_with_Calibration_for_Multi-Task_Learning.pdf

Main paper published in ICASSP 2023 (181)

Poster_Matrix_Completion_Calibration.pdf

Poster (169)

Slides.pdf

Slides (510)

Categories:: Other applications of machine learning (MLR-APPL)

23 Views

Bilateral Coarse-To-Fine Network For Point Cloud Completion

Read more about Bilateral Coarse-To-Fine Network For Point Cloud Completion
Log in to post comments

Point cloud completion aims to accurately estimate complete point clouds from partial observations. Existing methods often directly infer the missing points from the partial shape, but they suffer from limited structural information. To address this, we propose the Bilateral Coarse-to-Fine Network (BCFNet), which leverages 2D images as guidance to compensate for structural information loss. Our method introduces a multi-level codeword skip-connection to estimate structural details.

SIG-Port.zip

Paper, presentation slides, poster, and video with subtitle (193)

Categories:: Neural network learning (MLR-NNLR)

10 Views

MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling

Read more about MEET: A Monte Carlo Exploration-Exploitation Trade-off for Buffer Sampling
Log in to post comments

Data selection is essential for any data-based optimization technique, such as Reinforcement Learning. State-of-the-art sampling strategies for the experience replay buffer improve the performance of the Reinforcement Learning agent. However, they do not incorporate uncertainty in the Q-Value estimation. Consequently, they cannot adapt the sampling strategies, including exploration and exploitation of transitions, to the complexity of the task.

MEET_Poster.pdf

Poster (195)

icassp_2023 (7).pdf

Paper (194)

Categories:: Neural network learning (MLR-NNLR)

17 Views

DYNAMIC ALIGNMENT MASK CTC: IMPROVED MASK CTC WITH ALIGNED CROSS ENTROPY

Read more about DYNAMIC ALIGNMENT MASK CTC: IMPROVED MASK CTC WITH ALIGNED CROSS ENTROPY
Log in to post comments

5496.pdf

5496.pdf (215)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

19 Views

QI-TTS: QUESTIONING INTONATION CONTROL FOR EMOTIONAL SPEECH SYNTHESIS

Read more about QI-TTS: QUESTIONING INTONATION CONTROL FOR EMOTIONAL SPEECH SYNTHESIS
Log in to post comments

5605.pdf

Poster (185)

Categories:: Speech Synthesis and Generation, including TTS (SPE-SYNT)

28 Views

Evaluating Speech–Phoneme Alignment and Its Impact on Neural Text-To-Speech Synthesis

In recent years, the quality of text-to-speech (TTS) synthesis vastly improved due to deep-learning techniques, with parallel architectures, in particular, providing excellent synthesis quality at fast inference. Training these models usually requires speech recordings, corresponding phoneme-level transcripts, and the temporal alignment of each phoneme to the utterances. Since manually creating such fine-grained alignments requires expert knowledge and is time-consuming, it is common practice to estimate them using automatic speech–phoneme alignment methods.

2023_SpeechPhonemeAlignment_ICASSP.pdf

2023_SpeechPhonemeAlignment_ICASSP.pdf (218)

Categories:: Speech Synthesis and Generation, including TTS (SPE-SYNT)

29 Views

FEDEEG__FEDERATED_EEG_DECODING_VIA_INTER_SUBJECT_STRUCTURE_MATCHING

Read more about FEDEEG__FEDERATED_EEG_DECODING_VIA_INTER_SUBJECT_STRUCTURE_MATCHING
Log in to post comments

With sufficient centralized training data coming from multiple subjects, deep learning methods have achieved powerful EEG decoding performance. However, sending each individuals’ EEG data directly to a centralized server might cause privacy leakage. To overcome this issue, we present an inter-subject structure matching-based federated EEG decoding (FedEEG) framework. First, we introduce a center loss to each client (subject), which can learn multiple virtual class centers by averaging the corresponding class-specific EEG features.