Sorry, you need to enable JavaScript to visit this website.

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

A method for synthesizing the desired sound field while suppressing the exterior radiation power with directional weighting is proposed. The exterior radiation from the loudspeakers in sound field synthesis systems can be problematic in practical situations. Although several methods to suppress the exterior radiation have been proposed, suppression in all outward directions is generally difficult, especially when the number of loudspeakers is not sufficiently large.

Categories:
60 Views

A major concern of deep learning models is the large amount of data that is required to build and train them, much of which is reliant on sensitive and personally identifiable information that is vulnerable to access by third parties. Ideas of using the quantum internet to address this issue have been previously proposed, which would enable fast and completely secure online communications. Previous work has yielded a hybrid quantum-classical transfer learning scheme for classical data and communication with a hub-spoke topology.

Categories:
33 Views

Generalized Labeled Multi-Bernoulli (GLMB) densities arise in a host of multi-object system applications analogous to Gaussians in single-object filtering. However, computing the GLMB filtering density requires solving NP-hard problems. To alleviate this computational bottleneck, we develop a linear complexity Gibbs sampling framework for GLMB density computation.

Categories:
22 Views

Traditional frame-based cameras inevitably suffer from non-uniform blur in real-world scenarios. Event cameras that record the intensity changes with high temporal resolution provide an effective solution for image deblurring. In this paper, we formulate the event-based image deblurring as an image generation problem by designing diffusion priors for the image and residual. Specifically, we propose an alternative diffusion sampling framework to jointly estimate clear and residual images to ensure the quality of the final result.

Categories:
32 Views

Underwater monitoring and exploration have enhanced significantly due to the wide adoption of Internet of Underwater Things (IoUT). However, IoUT implementation is limited by batteries that require frequent replacement, which is costly and unfeasible due to the hostile aquatic environment. Therefore, it is crucial to implement an energy-efficient solution that maximizes the lifetime of IoUT devices, and hence reduce the overall cost of the system.

Categories:
22 Views

This paper introduces our system submission for the Cadenza ICASSP 2024 Grand Challenge, which presents the problem of remixing and enhancing music for hearing aid users. Our system placed first in the challenge, achieving the best average Hearing-Aid Audio Quality Index (HAAQI) score on the evaluation data set. We describe the system, which uses an ensemble of deep learning music source separators that are fine tuned on the challenge data.

Categories:
28 Views

In many practical parameter estimation problems, the observation model is periodic with respect to the unknown parameters. In these cases, the appropriate estimation criterion is periodic in the parameter space, and cyclic performance bounds should be used. However, existing cyclic performance bounds do not account for the common scenario of model misspecification. The misspecified Cramér-Rao bound (MCRB) provides a lower bound on the mean-squared-error (MSE) for estimation problems under model misspecification. However, the MCRB does not provide a valid bound for periodic problems.

Categories:
18 Views

We present a novel Speech Augmented Language Model (SALM) with multitask and in-context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a modality adapter module, and LoRA layers to accommodate speech input and associated task instructions. The unified SALM not only achieves performance on par with task-specific Conformer baselines for Automatic Speech Recognition (ASR) and Speech Translation (AST), but also exhibits zero-shot in-context learning capabilities, demonstrated through keyword-boosting task for ASR and AST.

Categories:
35 Views

We present a novel Speech Augmented Language Model (SALM) with multitask and in-context learning capabilities. SALM comprises a frozen text LLM, a audio encoder, a modality adapter module, and LoRA layers to accommodate speech input and associated task instructions. The unified SALM not only achieves performance on par with task-specific Conformer baselines for Automatic Speech Recognition (ASR) and Speech Translation (AST), but also exhibits zero-shot in-context learning capabilities, demonstrated through keyword-boosting task for ASR and AST.

Categories:
16 Views

Time-modulated arrays (TMA) transmitting orthogonal frequency division multiplexing (OFDM) waveforms achieve physical layer security by allowing the signal to reach the legitimate destination undistorted, while making the signal appear scrambled in all other directions. In this paper, we examine how secure the TMA OFDM system is, and show that it is possible for the eavesdropper to defy the scrambling.

Categories:
18 Views

Pages