![](https://sigport.org/sites/default/files/styles/medium/public/icassp24_logo.jpg?itok=fMlObe3v)
IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.
- Read more about Multi-Agent Sparse Interaction Modeling is An Anomaly Detection Problem
- Log in to post comments
Most real-world multi-agent tasks exhibit the characteristic of sparse interaction, wherein agents interact with each other in a limited number of crucial states while largely acting independently. Effectively modeling the sparse interaction and leveraging the learned interaction structure to instruct agents' learning processes can enhance the efficiency of multi-agent reinforcement learning algorithms. However, it remains unclear how to identify these specific interactive states solely through trials and errors within current multi-agent tasks.
- Categories:
![](https://sigport.org/sites/default/files/styles/list/public/icassp_fp.jpg?itok=N3a50U6B)
- Read more about State-Augmented Information Routing in Communication Systems with Graph Neural Networks
- Log in to post comments
We consider the problem of routing network packets in a large-scale communication system where the nodes have access to only local information. We formulate this problem as a constrained learning problem, which can be solved using a distributed optimization algorithm. We approach this distributed optimization using a novel state-augmentation (SA) strategy to maximize the aggregate information packets at different source nodes, leveraging dual variables corresponding to flow constraint violations.
- Categories:
![](https://sigport.org/sites/default/files/styles/home/public/icassp24_logo_0.jpg?itok=OGpw2wC4)
- Read more about AUDIO-VISUAL SPEECH RECOGNITION IN-THE-WILD: MULTI-ANGLE VEHICLE CABIN CORPUS AND ATTENTION-BASED METHOD
- Log in to post comments
Audio-Visual Speech Recognition In-The-Wild: Multi-Angle Vehicle Cabin Corpus And Attention-Based Method
- Categories:
![](https://sigport.org/sites/default/files/styles/home/public/icassp24_logo_0.jpg?itok=OGpw2wC4)
- Read more about Benchmarking Adversarial Robustness of Image Shadow Removal with Shadow-adaptive Attacks
- Log in to post comments
Shadow removal is a task aimed at erasing regional shadows present in images and reinstating visually pleasing natural scenes with consistent illumination. While recent deep learning techniques have demonstrated impressive performance in image shadow removal, their robustness against adversarial attacks remains largely unexplored. Furthermore, many existing attack frameworks typically allocate a uniform budget for perturbations across the entire input image, which may not be suitable for attacking shadow images.
- Categories:
![](https://sigport.org/sites/default/files/styles/home/public/icassp24_logo_0.jpg?itok=OGpw2wC4)
- Read more about MLSP-L13.4: Elevating Skeleton-Based Action Recognition with Efficient Multi-Modality Self-Supervision
- Log in to post comments
Self-supervised representation learning for human action recognition has developed rapidly in recent years. Most of the existing works are based on skeleton data while using a multi-modality setup.
- Categories:
![](https://sigport.org/sites/default/files/styles/home/public/icassp24_logo_0.jpg?itok=OGpw2wC4)
- Read more about Are Soft prompts good zero-shot learners for speech recognition?
- Log in to post comments
Large self-supervised pre-trained speech models require computationally expensive fine-tuning for downstream tasks. Soft prompt tuning offers a simple parameter-efficient alternative by utilizing minimal soft prompt guidance, enhancing portability while also maintaining competitive performance. However, not many people understand how and why this is so. In this study, we aim to deepen our understanding of this emerging method by investigating the role of soft prompts in automatic speech recognition (ASR).
- Categories:
![](https://sigport.org/sites/default/files/styles/home/public/icassp24_logo_0.jpg?itok=OGpw2wC4)
- Read more about MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation
- 1 comment
- Log in to post comments
Our previously proposed MossFormer has achieved promising performance in monaural speech separation. However, it predominantly adopts a self-attention-based MossFormer module, which tends to emphasize longer-range, coarser-scale dependencies, with a deficiency in effectively modelling finer-scale recurrent patterns. In this paper, we introduce a novel hybrid model that provides the capabilities to model both long-range, coarse-scale dependencies and fine-scale recurrent patterns by integrating a recurrent module into the MossFormer framework.
- Categories:
![](https://sigport.org/sites/default/files/styles/home/public/icassp24_logo_0.jpg?itok=OGpw2wC4)
- Read more about PART REPRESENTATION LEARNING WITH TEACHER-STUDENT DECODER FOR OCCLUDED PERSON RE-IDENTIFICATION
- Log in to post comments
Occluded person re-identification (ReID) is a very challenging task due to the occlusion disturbance and incomplete target information. Leveraging external cues such as human pose or parsing to locate and align part features has been proven to be very effective in occluded person ReID. Meanwhile, recent Transformer structures have a strong ability of long-range modeling. Considering the above facts, we propose a Teacher-Student Decoder (TSD) framework for occluded person ReID, which utilizes the Transformer decoder with the help of human parsing.
- Categories:
![](https://sigport.org/sites/default/files/styles/home/public/icassp24_logo_0.jpg?itok=OGpw2wC4)
- Read more about Poster for the paper "Buffered Gaussian Modeling For Vectorized HD Map Construction"
- Log in to post comments
Vectorized high-definition (HD) map construction is an important and challenging task for autonomous driving. End-to-end models have been developed recently to enable online map construction. Existing works have difficulty in generating complex geometric shapes and lack comprehensive evaluation metrics. To tackle these challenges, we introduce buffered IoU as a novel metric for vectorized map construction, which is clearly defined and applicable to real-world situations. Inspired by methods of rotated object detection, we further propose a novel technique called Buffered Gaussian Modeling.
- Categories:
![](https://sigport.org/sites/default/files/styles/home/public/icassp24_logo_0.jpg?itok=OGpw2wC4)
- Read more about Lightning Talk- Situation-Aware Tranmit Beamforming for Automotive radar
- Log in to post comments
Millimeter-wave radar is a common sensor modality used in automotive driving for target detection and perception. These radars can benefit from side information on the environment being sensed, such as lane topologies or data from other sensors. Existing radars do not leverage this information to adapt waveforms or perform prior-aware inference. In this paper, we model the side information as an occupancy map and design transmit beamformers that are customized to the map. Our method maximizes the probability of detection in regions with a higher uncertainty on the presence of a target.
- Categories: