Sorry, you need to enable JavaScript to visit this website.

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

In this work, we address the challenge of encoding speech captured by a microphone array using deep learning techniques with the aim of preserving and accurately reconstructing crucial spatial cues embedded in multi-channel recordings. We propose a neural spatial audio coding framework that achieves a high compression ratio, leveraging single-channel neural sub-band codec and SpatialCodec.

Categories:
64 Views

This paper proposes an efficient optimizer called AdaPlus which integrates Nesterov momentum and precise stepsize adjustment on AdamW basis. AdaPlus combines the advantages of AdamW, Nadam, and AdaBelief and, in particular, does not introduce any extra hyper-parameters. We perform extensive experimental evaluations on three machine learning tasks to validate the effectiveness of AdaPlus.

Categories:
35 Views

Advances in passive acoustic monitoring and machine learning have led to the procurement of vast datasets for computational bioacoustic research. Nevertheless, data scarcity is still an issue for rare and underrepresented species. This

Categories:
26 Views

We propose a sampling algorithm to perform system identification from a set of input-output graph signal pairs. The dynamics of the systems we study are given by a partially known adjacency matrix and a generic parametric graph filter of unknown parameters. The methodology we employ is built upon the principles of annealed Langevin diffusion. This enables us to draw samples from the posterior distribution instead of following the classical approach of point estimation using maximum likelihood.

Categories:
25 Views

The adversarial attack literature contains numerous algorithms for crafting perturbations which manipulate neural network predictions. Many of these adversarial attacks optimize inputs with the same constraints and have similar downstream impact on the models they attack. In this work, we first show how to reconstruct an adversarial perturbation, namely the difference between an adversarial example and the original natural image, from an adversarial example. Then, we classify reconstructed adversarial perturbations based on the algorithm that generated them.

Categories:
36 Views

In this paper, we propose a robust algorithm for designing unstructured Grassmannian constellations for noncoherent MIMO communications that accounts for the effect of hardware impairments (HWIs) such as I/Q imbalance (IQI) and carrier frequency offset (CFO). The algorithm uses the minimum diversity product as a cost function to ensure full-diversity constellations. The constellation points in the Grassmannian are optimized to be robust against any value of the HWIs belonging to a given uncertainty set, the values of which are determined by the characteristics of the hardware used.

Categories:
22 Views

Distributional Reinforcement Learning (RL) estimates return distribution mainly by learning quantile values via minimizing the quantile Huber loss function, entailing a threshold parameter often selected heuristically or via hyperparameter search, which may not generalize well and can be suboptimal. This paper introduces a generalized quantile Huber loss function derived from Wasserstein distance (WD) calculation between Gaussian distributions, capturing noise in predicted (current) and target (Bellmanupdated) quantile values.

Categories:
31 Views

The last standard Versatile Video Codec (VVC) aims to improve the compression efficiency by saving around 50% of bitrate at the same quality compared to its predecessor High Efficiency Video Codec (HEVC). However, this comes with higher encoding complexity mainly due to a much larger number of block splits to be tested on the encoder side.

Categories:
36 Views

Text prompts are crucial for generalizing pre-trained open-set object detection models to new categories. However, current methods for text prompts are limited as they require manual feedback when generalizing to new categories, which restricts their ability to model complex scenes, often leading to incorrect detection results. To address this limitation, we propose a novel visual prompt method that learns new category knowledge from a few labeled images, which generalizes the pre-trained detection model to the new category.

Categories:
24 Views

Systolic blood pressure (BP) variability at daytime and nighttime, also known as BP dip, has shown its clinical value in diagnosis and treatment of cardiovascular diseases (CVDs). A model agnostic meta learning (MAML)-based 24-hour personalized approach is proposed in this paper for BP estimation using wrist photoplethysmography (PPG) signals from smart watch in free-living context.

Categories:
42 Views

Pages