IEEE ICASSP 2024

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Poster for the paper "Buffered Gaussian Modeling For Vectorized HD Map Construction"

Read more about Poster for the paper "Buffered Gaussian Modeling For Vectorized HD Map Construction"
Log in to post comments

Vectorized high-definition (HD) map construction is an important and challenging task for autonomous driving. End-to-end models have been developed recently to enable online map construction. Existing works have difficulty in generating complex geometric shapes and lack comprehensive evaluation metrics. To tackle these challenges, we introduce buffered IoU as a novel metric for vectorized map construction, which is clearly defined and applicable to real-world situations. Inspired by methods of rotated object detection, we further propose a novel technique called Buffered Gaussian Modeling.

ICASSP_2024_poster.pdf

ICASSP_2024_poster.pdf (160)

Categories:: Image/Video Coding

16 Views

Lightning Talk- Situation-Aware Tranmit Beamforming for Automotive radar

Read more about Lightning Talk- Situation-Aware Tranmit Beamforming for Automotive radar
Log in to post comments

Millimeter-wave radar is a common sensor modality used in automotive driving for target detection and perception. These radars can benefit from side information on the environment being sensed, such as lane topologies or data from other sensors. Existing radars do not leverage this information to adapt waveforms or perform prior-aware inference. In this paper, we model the side information as an occupancy map and design transmit beamformers that are customized to the map. Our method maximizes the probability of detection in regions with a higher uncertainty on the presence of a target.

Lightning Talk.pptx

Lightning Talk.pptx (112)

Categories:: Other

15 Views

Poster: Synchformer: Efficient Synchronization from Sparse Cues

Read more about Poster: Synchformer: Efficient Synchronization from Sparse Cues
Log in to post comments

Our objective is audio-visual synchronization with a focus on ‘in-the-wild’ videos, such as those on YouTube, where synchronization cues can be sparse. Our contributions include a novel audio-visual synchronization model, and training that decouples feature extraction from synchronization modelling through multi-modal segment-level contrastive pre-training. This approach achieves state-of-the-art performance in both dense and sparse settings.

vi_poster.pdf

vi_poster.pdf (148)

Categories:: Multimodal signal processing
Image/Video Processing

39 Views

INVESTIGATING THE CLUSTERS DISCOVERED BY PRE-TRAINED AV-HUBERT

Read more about INVESTIGATING THE CLUSTERS DISCOVERED BY PRE-TRAINED AV-HUBERT
Log in to post comments

Self-supervised models, such as HuBERT and its audio-visual version AV-HuBERT, have demonstrated excellent performance on various tasks. The main factor for their success is the pre-training procedure, which requires only raw data without human transcription. During the self-supervised pre-training phase, HuBERT is trained to discover latent clusters in the training data, but these clusters are discarded, and only the last hidden layer is used by the conventional finetuning step.

ICASSP_2024_AV_HuBERT.pdf

ICASSP_2024_AV_HuBERT.pdf (140)

Categories:: General Topics in Speech Recognition (SPE-GASR)

32 Views

ON THE CHOICE OF THE OPTIMAL TEMPORAL SUPPORT FOR AUDIO CLASSIFICATION WITH PRE-TRAINED EMBEDDINGS

Current state-of-the-art audio analysis systems rely on pre-trained embedding models, often used off-the-shelf as (frozen) feature extractors. Choosing the best one for a set of tasks is the subject of many recent publications. However, one aspect often overlooked in these works is the influence of the duration of audio input considered to extract an embedding, which we refer to as Temporal Support (TS). In this work, we study the influence of the TS for well-established or emerging pre-trained embeddings, chosen to represent different types of architectures and learning paradigms.

24_04_05 - Icassp Presentation.pdf

24_04_05 - Icassp Presentation.pdf (154)

Categories:: Audio Analysis and Synthesis

17 Views

UNLOCKING DEEP LEARNING: A BP-FREE APPROACH FOR PARALLEL BLOCK-WISE TRAINING OF NEURAL NETWORKS

Backpropagation (BP) has been a successful optimization technique for deep learning models. However, its limitations, such as backward- and update-locking, and its biological implausibility, hinder the concurrent updating of layers and do not mimic the local learning processes observed in the human brain. To address these issues, recent research has suggested using local error signals to asynchronously train network blocks. However, this approach often involves extensive trial-and-error iterations to determine the best configuration for local training.

ICASSP Presentation 2.pptx

Presentation slides (129)

ANZHE Poster.pptx

Poster (118)

per-print paper.pdf

Pre-Print paper (167)

Categories:: Neural network learning (MLR-NNLR)

30 Views

MultiWay-Adapter: Adapting Multimodal Large Language Models for scalable image-text retrieval

As Multimodal Large Language Models (MLLMs) grow in size, adapting them to specialized tasks becomes increasingly challenging due to high computational and memory demands. While efficient adaptation methods exist, in practice they suffer from shallow inter-modal alignment, which severely hurts model effectiveness. To tackle these challenges, we introduce the MultiWay-Adapter (MWA), which deepens inter-modal alignment, enabling high transferability with minimal tuning effort.

ICASSP2024_multiway_poster_final.pdf

ICASSP2024_multiway_poster_final.pdf (180)

Categories:: Image/Video Storage, Retrieval

23 Views

PENDANTSS: PEnalized Norm-Ratios Disentangling Additive Noise, Trend and Sparse Spikes

Denoising, detrending, deconvolution: usual restoration tasks, traditionally decoupled. Coupled formulations entail complex ill-posed inverse problems. We propose PENDANTSS for joint trend removal and blind deconvolution of sparse peak-like signals. It blends a parsimonious prior with the hypothesis that smooth trend and noise can somewhat be separated by low-pass filtering.

Zheng_P_2024_p-icassp-pendantss-sparse-l0-norm-ratio-source-separation-peak-chemistry.pdf

Poster for PENDANTSS: PEnalized Norm-Ratios Disentangling Additive Noise, Trend and Sparse Spikes (173)

Categories:: Source separation (MLR-SSEP)
Statistical Signal Processing

80 Views

LEARNING FROM TAXONOMY: MULTI-LABEL FEW-SHOT CLASSIFICATION FOR EVERYDAY SOUND RECOGNITION

Humans categorise and structure perceived acoustic signals into hierarchies of auditory objects. The semantics of these objects are thus informative in sound classification, especially in few-shot scenarios. However, existing works have only represented audio semantics as binary labels (e.g., whether a recording contains \textit{dog barking} or not), and thus failed to learn a more generic semantic relationship among labels. In this work, we introduce an ontology-aware framework to train multi-label few-shot audio networks with both relative and absolute relationships in an audio taxonomy.

Jinhua-ICASSP 2024.pptx

Jinhua-ICASSP 2024.pptx (136)

Categories:: Audio Analysis and Synthesis

20 Views

LEARNED ISTA WITH ERROR-BASED THRESHOLDING FOR ADAPTIVE SPARSE CODING

Read more about LEARNED ISTA WITH ERROR-BASED THRESHOLDING FOR ADAPTIVE SPARSE CODING
Log in to post comments

Drawing on theoretical insights, we advocate an error-based thresholding (EBT) mechanism for learned ISTA (LISTA), which utilizes a function of the layer-wise reconstruction error to suggest a specific threshold for each observation in the shrinkage function of each layer. We show that the proposed EBT mechanism well disentangles the learnable parameters in the shrinkage functions from the reconstruction errors, endowing the obtained models with improved adaptivity to possible data variations.

poster.pdf

poster.pdf (196)

Categories:: Signal Processing Theory and Methods

26 Views

IEEE ICASSP 2024

Pages