Sorry, you need to enable JavaScript to visit this website.

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

We introduce a new zero resource code-switched speech benchmark designed to directly assess the code-switching capabilities of self-supervised speech encoders. We showcase a baseline system of language modeling on discrete units to demonstrate how the code-switching abilities of speech encoders can be assessed in a zero-resource manner. Our experiments encompass a variety of well-known speech encoders, including Wav2vec 2.0, HuBERT, XLSR, etc. We examine the impact of pre-training languages and model size on benchmark performance.

Categories:
19 Views

Hyperspectral imaging is a promising imaging modality, and has attracted increasing research attention by compressive sensing such as coded aperture snapshot spectral imaging (CASSI), for simultaneously capturing abundant information in spatial, spectral and temporal domains. Hyperspectral image (HSI) reconstruction in the CASSI aims to retrieve the original 3D signal upon the 2D compressed snapshot.

Categories:
28 Views

In this article, we propose an approach for federated domain adaptation, a setting where distributional shift exists among clients and some have unlabeled data. The proposed framework, FedDaDiL, tackles the resulting challenge through dictionary learning of empirical distributions. In our setting, clients' distributions represent particular domains, and FedDaDiL collectively trains a federated dictionary of empirical distributions. In particular, we build upon the Dataset Dictionary Learning framework by designing collaborative communication protocols and aggregation operations.

Categories:
5 Views

Recently, coded aperture snapshot spectral imaging (CASSI) has been actively researched to capture three-dimensional (3D) hyperspectral (HS) images for dynamic scenes, where the optical systems detect a 2D snapshot measurement while a computational algorithm performs the inverse problem for recovering the latent HS cubic data. Benefiting from the powerful modeling capability of the deep convolution neural networks (DCNN), the reconstruction performance of the HS images has been significantly improved.

Categories:
12 Views

In this paper, we consider the intersection of two problems in machine learning: Multi-Source Domain Adaptation (MSDA) and Dataset Distillation (DD). On the one hand, the first considers adapting multiple heterogeneous labeled source domains to an unlabeled target domain. On the other hand, the second attacks the problem of synthesizing a small summary containing all the information about the datasets. We thus consider a new problem called MSDA-DD.

Categories:
13 Views

WHO's report on environmental noise estimates that 22 M people suffer from chronic annoyance related to noise caused by audio events (AEs) from various sources. Annoyance may lead to health issues and adverse effects on metabolic and cognitive systems. In cities, monitoring noise levels does not provide insights into noticeable AEs, let alone their relations to annoyance. To create annoyance-related monitoring, this paper proposes a graph-based model to identify AEs in a soundscape, and explore relations between diverse AEs and human-perceived annoyance rating (AR).

Categories:
15 Views

Multiscale convolutional neural network (CNN) has demonstrated remarkable capabilities in solving various vision problems. However, fusing features of different scales always results in large model sizes, impeding the application of mul-
tiscale CNNs in RGB-D saliency detection. In this paper, we propose a customized feature fusion module, called Saliency Enhanced Feature Fusion (SEFF), for RGB-D saliency detection. SEFF utilizes saliency maps of the neighboring scales

Categories:
22 Views

Deep Neural Networks (DNNs) are known to be vulnerable to adversarial examples, which are crafted by adding imperceptible perturbations to clean examples. With the wide applications of DNNs to Synthetic Aperture Radar (SAR) Automatic Target Recognition (ATR), the vulnerability of SAR deep recognition models has attracted increasing attention.

Categories:
8 Views

Reconfigurable intelligent surfaces (RISs) have been considered recently for target localization. While existing literature typically uses fixed RISs in the environment, mounting RISs on targets is a novel approach that can improve target visibility and positioning. This study derives the Cramér-Rao bound (CRB) for pose estimation (i.e., RIS position and orientation) under a generic wideband and near-field model. The theoretical findings show that a pose-dependent filtering phenomenon occurs, impacting the CRB, which is neglected under narrowband approximation.

Categories:
7 Views

Diffusion-based generative speech enhancement (SE) has recently received attention, but reverse diffusion remains time-consuming. One solution is to initialize the reverse diffusion process with enhanced features estimated by a predictive SE system. However, the pipeline structure currently does not consider for a combined use of generative and predictive decoders. The predictive decoder allows us to use the further complementarity between predictive and diffusion-based generative SE.

Categories:
9 Views

Pages