IEEE ICASSP 2024

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Flattening Singular Values of Factorized Convolution for Medical Images

Read more about Flattening Singular Values of Factorized Convolution for Medical Images
Log in to post comments

Convolutional neural networks (CNNs) have long been the paradigm of choice for robust medical image processing (MIP). Therefore, it is crucial to effectively and efficiently deploy CNNs on devices with different computing capabil- ities to support computer-aided diagnosis. Many methods employ factorized convolutional layers to alleviate the bur- den of limited computational resources at the expense of expressiveness.

ICASSP_24_Poster_Zexin.pdf

ICASSP_24_Poster_Zexin.pdf (175)

Categories:: Medical image analysis

30 Views

Personalised Anomaly Detectors and Prototypical Representations for Relapse Detection from Wearable-Based Digital Phenotyping

We describe our contribution to the 2nd e-Prevention challenge, which focuses on the unsupervised non-psychotic (Track 1) and psychotic (Track 2) relapse detection using wearable-based digital phenotyping. We exploit the measurements gathered from the gyroscope, the accelerometer, and the heart rate-related sensors embedded in a smartwatch. We also include the available sleep information in our experiments. Four dedicated autoencoders are trained to learn embedded representations from each one of the considered modalities.

Personalised Anomaly Detectors and Prototypical Representations for Relapse Detection from Wearable-Based Digital Phenotyping.pdf

Personalised Anomaly Detectors and Prototypical Representations for Relapse Detection from Wearable-Based Digital Phenotyping.pdf (136)

Categories:: Machine Learning for Signal Processing

15 Views

Tunisian Code Switched ASR Presentation

Read more about Tunisian Code Switched ASR Presentation
Log in to post comments

Crafting an effective Automatic Speech Recognition (ASR) solution for dialects demands innovative approaches that not only address the data scarcity issue but also navigate the intricacies of linguistic diversity. In this paper, we address the aforementioned ASR challenge, focusing on the Tunisian dialect. First, textual and audio data is collected and in some cases annotated.

Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition.pdf

Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition.pdf (117)

Categories:: Audio and Acoustic Signal Processing

21 Views

TALKNCE: IMPROVING ACTIVE SPEAKER DETECTION WITH TALK-AWARE CONTRASTIVE LEARNING

Read more about TALKNCE: IMPROVING ACTIVE SPEAKER DETECTION WITH TALK-AWARE CONTRASTIVE LEARNING
Log in to post comments

The goal of this work is Active Speaker Detection (ASD), a task to determine whether a person is speaking or not in a series of video frames.
Previous works have dealt with the task by exploring network architectures while learning effective representations has been less explored.
In this work, we propose TalkNCE, a novel talk-aware contrastive loss. The loss is only applied to part of the full segments where a person on the screen is actually speaking.

2024_ICASSP_TalkNCE.pptx

2024_ICASSP_TalkNCE.pptx (220)

Categories:: Multimedia Signal Processing

14 Views

SCORE-BASED DIFFUSION MODELS FOR PHOTOACOUSTIC TOMOGRAPHY IMAGE RECONSTRUCTION

Read more about SCORE-BASED DIFFUSION MODELS FOR PHOTOACOUSTIC TOMOGRAPHY IMAGE RECONSTRUCTION
Log in to post comments

Photoacoustic tomography (PAT) is a rapidly-evolving medical imaging modality that combines optical absorption contrast with ultrasound imaging depth. One challenge in PAT is image reconstruction with inadequate acoustic signals due to limited sensor coverage or due to the density of the transducer array. Such cases call for solving an ill-posed inverse reconstruction problem. In this work, we use score-based diffusion models to solve the inverse problem of reconstructing an image from limited PAT measurements.

icassp_presentation_sigport.pdf

icassp_presentation_sigport.pdf (614)

Categories:: Medical imaging

44 Views

MUSICLDM: ENHANCING NOVELTY IN TEXT-TO-MUSIC GENERATION USING BEAT-SYNCHRONOUS MIXUP STRATEGIES

Diffusion models have shown promising results in cross-modal generation tasks, including text-to-image and text-to-audio generation. However, generating music, as a special type of audio, presents unique challenges due to limited availability of music data and sensitive issues related to copyright and plagiarism. In this paper, to tackle these challenges, we first construct a state-of-the-art text-to-music model, MusicLDM, that adapts Stable Diffusion and AudioLDM architectures to the music domain.

musicldm_poster.pdf

musicldm_poster.pdf (249)

Categories:: Music Signal Processing

25 Views

Channel Estimation in Underdetermined Systems Utilizing Variational Autoencoders

Read more about Channel Estimation in Underdetermined Systems Utilizing Variational Autoencoders
Log in to post comments

In this work, we propose to utilize a variational autoencoder (VAE) for channel estimation (CE) in underdetermined (UD) systems. The basis of the method forms a recently proposed concept in which a VAE is trained on channel state information (CSI) data and used to parameterize an approximation to the mean squared error (MSE)-optimal estimator. The contributions in this work extend the existing framework from fully-determined (FD) to UD systems, which are of high practical relevance.

baur_vae_ud.pdf

baur_vae_ud.pdf (187)

Categories:: MIMO Communications and Signal Processing
Bayesian learning; Bayesian signal processing (MLR-BAYL)

35 Views

MDX-GAN: ENHANCING PERCEPTUAL QUALITY IN MULTI-CLASS SOURCE SEPARATION VIA ADVERSARIAL TRAINING

Audio source separation aims to extract individual sound sources from an audio mixture. Recent studies on source separation focus primarily on minimizing signal-level distance, typically measured by source-to-distortion ratio (SDR). However, scant attention has been given to the perceptual quality of the separated tracks. In this paper, we propose MDX-GAN, an efficient and high-fidelity audio source separator based on MDX-Net for multiple sound classes. We leverage different training objectives to enhance the perceptual quality of audio source separation.

mdxgan_poster.pdf

mdxgan_poster.pdf (188)

Categories:: Source Separation and Signal Enhancement

36 Views

fangyuan_slide

Read more about fangyuan_slide
Log in to post comments

upload slide

CAT-WAVENET-EEG.pdf

CAT-WAVENET-EEG.pdf (344)

Categories:: Bio Imaging and Signal Processing

24 Views

DIFFUSION-BASED SPEECH ENHANCEMENT IN MATCHED AND MISMATCHED CONDITIONS USING A HEUN-BASED SAMPLER

Diffusion models are a new class of generative models that have recently been applied to speech enhancement successfully. Previous works have demonstrated their superior performance in mismatched conditions compared to state-of-the art discriminative models. However, this was investigated with a single database for training and another one for testing, which makes the results highly dependent on the particular databases. Moreover, recent developments from the image generation literature remain largely unexplored for speech enhancement.

gonzalez_diffusion_icassp2024.pdf

gonzalez_diffusion_icassp2024.pdf (139)

Categories:: Speech Enhancement (SPE-ENHA)

35 Views

IEEE ICASSP 2024

Pages