IEEE ICASSP 2024

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Zero Resource Code-switched Speech Benchmark Using Speech Utterance Pairs For Multiple Spoken Languages

We introduce a new zero resource code-switched speech benchmark designed to directly assess the code-switching capabilities of self-supervised speech encoders. We showcase a baseline system of language modeling on discrete units to demonstrate how the code-switching abilities of speech encoders can be assessed in a zero-resource manner. Our experiments encompass a variety of well-known speech encoders, including Wav2vec 2.0, HuBERT, XLSR, etc. We examine the impact of pre-training languages and model size on benchmark performance.

2310.03018.pdf

https://arxiv.org/abs/2310.03018 (257)

Categories:: Audio Processing Systems
Multilingual Recognition and Identification (SPE-MULT)
Language Modeling, for Speech and SLP (SLP-LANG)

29 Views

Hyperspectral Image Reconstruction using Hierarchical Neural Architecture Search from a Snapshot Image

Hyperspectral imaging is a promising imaging modality, and has attracted increasing research attention by compressive sensing such as coded aperture snapshot spectral imaging (CASSI), for simultaneously capturing abundant information in spatial, spectral and temporal domains. Hyperspectral image (HSI) reconstruction in the CASSI aims to retrieve the original 3D signal upon the 2D compressed snapshot.

ID6267.pdf

ID6267.pdf (172)

Categories:: Machine Learning for Signal Processing

31 Views

Federated Dataset Dictionary Learning For Multi-Source Domain Adaptation

Read more about Federated Dataset Dictionary Learning For Multi-Source Domain Adaptation
Log in to post comments

In this article, we propose an approach for federated domain adaptation, a setting where distributional shift exists among clients and some have unlabeled data. The proposed framework, FedDaDiL, tackles the resulting challenge through dictionary learning of empirical distributions. In our setting, clients' distributions represent particular domains, and FedDaDiL collectively trains a federated dictionary of empirical distributions. In particular, we build upon the Dataset Dictionary Learning framework by designing collaborative communication protocols and aggregation operations.

poster_federated.pdf

poster_federated.pdf (198)

Categories:: Other applications of machine learning (MLR-APPL)

8 Views

Deep Versatile Hyperspectral Reconstruction Model from a Snapshot Measurement with Arbitrary Masks

Recently, coded aperture snapshot spectral imaging (CASSI) has been actively researched to capture three-dimensional (3D) hyperspectral (HS) images for dynamic scenes, where the optical systems detect a 2D snapshot measurement while a computational algorithm performs the inverse problem for recovering the latent HS cubic data. Benefiting from the powerful modeling capability of the deep convolution neural networks (DCNN), the reconstruction performance of the HS images has been significantly improved.

ID1717.pdf

ID1717.pdf (218)

Categories:: Machine Learning for Signal Processing

17 Views

Multi-Source Domain Adaptation meets Dataset Distillation through Dataset Dictionary Learning

In this paper, we consider the intersection of two problems in machine learning: Multi-Source Domain Adaptation (MSDA) and Dataset Distillation (DD). On the one hand, the first considers adapting multiple heterogeneous labeled source domains to an unlabeled target domain. On the other hand, the second attacks the problem of synthesizing a small summary containing all the information about the datasets. We thus consider a new problem called MSDA-DD.

poster_distillation.pdf

poster_distillation.pdf (204)

Categories:: Other applications of machine learning (MLR-APPL)

25 Views

Multi-Level Graph Learning For Audio Event Classification And Human-Perceived Annoyance Rating Prediction

WHO's report on environmental noise estimates that 22 M people suffer from chronic annoyance related to noise caused by audio events (AEs) from various sources. Annoyance may lead to health issues and adverse effects on metabolic and cognitive systems. In cities, monitoring noise levels does not provide insights into noticeable AEs, let alone their relations to annoyance. To create annoyance-related monitoring, this paper proposes a graph-based model to identify AEs in a soundscape, and explore relations between diverse AEs and human-perceived annoyance rating (AR).

4320_hou_poster.pdf

4320_hou_poster.pdf (296)

Categories:: Content-Based Audio Processing
Multimedia Signal Processing

23 Views

A SALIENCY ENHANCED FEATURE FUSION BASED MULTISCALE RGB-D SALIENT OBJECT DETECTION NETWORK

Multiscale convolutional neural network (CNN) has demonstrated remarkable capabilities in solving various vision problems. However, fusing features of different scales always results in large model sizes, impeding the application of mul-
tiscale CNNs in RGB-D saliency detection. In this paper, we propose a customized feature fusion module, called Saliency Enhanced Feature Fusion (SEFF), for RGB-D saliency detection. SEFF utilizes saliency maps of the neighboring scales

final_poster.pdf

final_poster.pdf (177)

Categories:: Signal Processing Theory and Methods

26 Views

Image Mixing and Gradient Smoothing to Enhance the SAR Image Attack

Read more about Image Mixing and Gradient Smoothing to Enhance the SAR Image Attack
Log in to post comments

Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples, which are crafted by adding imperceptible perturbations to clean examples. With the wide applications of DNNs to Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR), the vulnerability of SAR deep recognition models has attracted increasing attention.

ICASSP2024_Poster__IMGS.pdf

ICASSP2024_Poster__IMGS.pdf (186)

Categories:: Other

11 Views

RIS LOCALIZATION AND SPATIALLY WIDEBAND FILTERING EFFECTS

Read more about RIS LOCALIZATION AND SPATIALLY WIDEBAND FILTERING EFFECTS
Log in to post comments

Reconfigurable intelligent surfaces (RISs) have been considered recently for target localization. While existing literature typically uses fixed RISs in the environment, mounting RISs on targets is a novel approach that can improve target visibility and positioning. This study derives the Cramér-Rao bound (CRB) for pose estimation (i.e., RIS position and orientation) under a generic wideband and near-field model. The theoretical findings show that a pose-dependent filtering phenomenon occurs, impacting the CRB, which is neglected under narrowband approximation.

RIS_LOCALIZATION_ICASSP.pptx

RIS_LOCALIZATION_ICASSP.pptx (285)

RIS_Localization_and_Spatially_Wideband_Filtering_Effects.pdf

RIS_Localization_and_Spatially_Wideband_Filtering_Effects.pdf (285)

Categories:: Signal Processing for Communications and Networking

10 Views

DIFFUSION-BASED SPEECH ENHANCEMENT WITH JOINT GENERATIVE AND PREDICTIVE DECODERS

Read more about DIFFUSION-BASED SPEECH ENHANCEMENT WITH JOINT GENERATIVE AND PREDICTIVE DECODERS
Log in to post comments

Diffusion-based generative speech enhancement (SE) has recently received attention, but reverse diffusion remains time-consuming. One solution is to initialize the reverse diffusion process with enhanced features estimated by a predictive SE system. However, the pipeline structure currently does not consider for a combined use of generative and predictive decoders. The predictive decoder allows us to use the further complementarity between predictive and diffusion-based generative SE.

ICASSP24-Diffusion.pptx

ICASSP24-Diffusion.pptx (151)

Categories:: Source Separation and Signal Enhancement

15 Views

IEEE ICASSP 2024

Pages