IEEE ICASSP 2024

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Noise-Disentangled Graph Contrastive Learning via Low-Rank and Sparse Subspace Decomposition

Graph contrastive learning aims to learn a representative model by maximizing the agreement between different views of the same graph. Existing studies usually allow multifarious noise in data augmentation, and suffer from trivial and inconsistent generation of graph views. Moreover, they mostly impose contrastive constraints on pairwise representations, limiting the structural correlations among multiple nodes. Both problems may hinder graph contrastive learning, leading to suboptimal node representations.

zhanggehang-icassp-oral-v2.pptx

zhanggehang-icassp-oral-v2.pptx (200)

Categories:: Machine Learning for Signal Processing

34 Views

CROSS-ATTENTION-GUIDED WAVENET FOR MEL SPECTROGRAM RECONSTRUCTION IN THE ICASSP 2024 AUDITORY EEG CHALLENGE slide

upload slide

icasspppt_new.pdf

icasspppt_new.pdf (363)

Categories:: Bio Imaging and Signal Processing

84 Views

Multi-Agent Sparse Interaction Modeling is An Anomaly Detection Problem

Read more about Multi-Agent Sparse Interaction Modeling is An Anomaly Detection Problem
Log in to post comments

Most real-world multi-agent tasks exhibit the characteristic of sparse interaction, wherein agents interact with each other in a limited number of crucial states while largely acting independently. Effectively modeling the sparse interaction and leveraging the learned interaction structure to instruct agents' learning processes can enhance the efficiency of multi-agent reinforcement learning algorithms. However, it remains unclear how to identify these specific interactive states solely through trials and errors within current multi-agent tasks.

Paper#3829_poster.pdf

The poster of paper #3829 (300)

Categories:: Sequential learning; sequential decision methods (MLR-SLER)

71 Views

State-Augmented Information Routing in Communication Systems with Graph Neural Networks

We consider the problem of routing network packets in a large-scale communication system where the nodes have access to only local information. We formulate this problem as a constrained learning problem, which can be solved using a distributed optimization algorithm. We approach this distributed optimization using a novel state-augmentation (SA) strategy to maximize the aggregate information packets at different source nodes, leveraging dual variables corresponding to flow constraint violations.

ICASSP 2024_3.pdf

ICASSP 2024_3.pdf (318)

Categories:: Communication and Sensing aspects of Sensor Networks, Wireless and Ad-Hoc Networks
Neural network learning (MLR-NNLR)

25 Views

AUDIO-VISUAL SPEECH RECOGNITION IN-THE-WILD: MULTI-ANGLE VEHICLE CABIN CORPUS AND ATTENTION-BASED METHOD

Audio-Visual Speech Recognition In-The-Wild: Multi-Angle Vehicle Cabin Corpus And Attention-Based Method

ICASSP_2024_v3.pptx

ICASSP_2024_v3.pptx (173)

Categories:: Other

37 Views

Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks

Shadow removal is a task aimed at erasing regional shadows present in images and reinstating visually pleasing natural scenes with consistent illumination. While recent deep learning techniques have demonstrated impressive performance in image shadow removal, their robustness against adversarial attacks remains largely unexplored. Furthermore, many existing attack frameworks typically allocate a uniform budget for perturbations across the entire input image, which may not be suitable for attacking shadow images.

ICASSP2024_slides.pptx

ICASSP2024_slides.pptx (217)

Categories:: Image/Video Processing

23 Views

MLSP-L13.4: Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision

Self-supervised representation learning for human action recognition has developed rapidly in recent years. Most of the existing works are based on skeleton data while using a multi-modality setup.

ICASSP_L13_4.pptx

ICASSP_L13_4.pptx (343)

Categories:: Multimedia Signal Processing

24 Views

Are Soft prompts good zero-shot learners for speech recognition?

Read more about Are Soft prompts good zero-shot learners for speech recognition?
Log in to post comments

Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understanding of this emerging method by investigating the role of soft prompts in automatic speech recognition (ASR).

ICASSP2024_dianwen_oral_prompts.pptx

ICASSP2024_dianwen_oral_prompts.pptx (212)

Categories:: Resource constrained speech recognition (SPE-RCSR)
Robust Speech Recognition (SPE-ROBU)

29 Views

MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation

Our previously proposed MossFormer has achieved promising performance in monaural speech separation. However, it predominantly adopts a self-attention-based MossFormer module, which tends to emphasize longer-range, coarser-scale dependencies, with a deficiency in effectively modelling finer-scale recurrent patterns. In this paper, we introduce a novel hybrid model that provides the capabilities to model both long-range, coarse-scale dependencies and fine-scale recurrent patterns by integrating a recurrent module into the MossFormer framework.

ICASSP2024-MossFormer2-Poster-zsk.pdf

ICASSP2024-MossFormer2-Poster-zsk.pdf (352)

Categories:: Speech Enhancement (SPE-ENHA)
Source Separation and Signal Enhancement

66 Views

PART REPRESENTATION LEARNING WITH TEACHER-STUDENT DECODER FOR OCCLUDED PERSON RE-IDENTIFICATION

Occluded person re-identification (ReID) is a very challenging task due to the occlusion disturbance and incomplete target information. Leveraging external cues such as human pose or parsing to locate and align part features has been proven to be very effective in occluded person ReID. Meanwhile, recent Transformer structures have a strong ability of long-range modeling. Considering the above facts, we propose a Teacher-Student Decoder (TSD) framework for occluded person ReID, which utilizes the Transformer decoder with the help of human parsing.