Sorry, you need to enable JavaScript to visit this website.

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

With neuroimaging data scientists have gained substantial information on the neuronal underpinning of intelligence. Yet how to integrate multimodal neuronal features effectively in relation to intelligence remains elusive. In this paper, we have developed a reference Canonical Correlation Analysis (RCCA) model that extracts latent, correlated multimodal features while enhancing correlation to a reference of interest.

Categories:
17 Views

In this study, we develop a holistic framework for space-time adaptive processing (STAP) in connected and automated vehicle (CAV) radar systems. We investigate a CAV system consisting of multiple vehicles that transmit frequency-modulated continuous-waveforms (FMCW), thereby functioning as a multistatic radar. Direct application of STAP in a network of radar systems such as in a CAV may lead to excess interference. We exploit time division multiplexing (TDM) to perform transmitter scheduling over FMCW pulses to achieve high detection performance.

Categories:
15 Views

Gearbox condition monitoring and quality surveillance are crucial techniques to ensure safe and cost-efficient machine operations. In condition monitoring, the interpretation of the different vibration spectrum elements is still an open question, many works show that some predefined vibration models are improper to explain the spectrum contents. In this paper, we investigate a method to identify the mixture model that describes a single-stage planetary gearbox vibration to properly interpret the vibration spectrum.

Categories:
9 Views

The ability to detect manipulated visual content is becoming increasingly important in many application fields, given the rapid advances in image synthesis methods. Of particular concern is the possibility of modifying the content of medical images, altering the resulting diagnoses. Despite its relevance, this issue has received limited attention from the research community. One reason is the lack of large and curated datasets to use for development and benchmarking purposes.

Categories:
6 Views

Online adaptation to distribution shifts in satellite image segmentation stands as a crucial yet underexplored problem. In this paper, we address source-free and online domain adaptation, i.e., test-time adaptation (TTA), for satellite images, with the focus on mitigating distribution shifts caused by various forms of image degradation. Towards achieving this goal, we propose a novel TTA approach involving two effective strategies. First, we progressively estimate the global Batch Normalization (BN) statistics of the target distribution with incoming data stream.

Categories:
12 Views

Sound event localization and detection (SELD) is an important task in machine listening.
Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels.
SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape.
However, RIRs require manual collection in specific rooms.
We present SpatialScaper, a library for SELD data simulation and augmentation.

Categories:
10 Views

Emotions lie on a continuum, but current models treat emotions
as a finite valued discrete variable. This representation does not
capture the diversity in the expression of emotion. To better rep-
resent emotions we propose the use of natural language descrip-
tions (or prompts). In this work, we address the challenge of au-
tomatically generating these prompts and training a model to better
learn emotion representations from audio and prompt pairs. We use
acoustic properties that are correlated to emotion like pitch, intensity,

Categories:
22 Views

Virtual Bass Enhancement (VBE) refers to a class of digital signal processing algorithms that aim at enhancing the perception of low frequencies in audio applications. Such algorithms typically exploit well-known psychoacoustic effects and are particularly valuable for improving the performance of small-size transducers often found in consumer electronics. Though both time- and frequency-domain techniques have been proposed in the literature, none of them capitalizes on the latest achievements of deep learning as far as music processing is concerned.

Categories:
3 Views

The use of Mutual Information (MI) as a measure to evaluate the efficiency of cryptosystems has an extensive history. However, estimating MI between unknown random variables in a high-dimensional space is challenging. Recent advances in machine learning have enabled progress in estimating MI using neural networks. This work presents a novel application of MI estimation in the field of cryptography. We propose applying this methodology directly to estimate the MI between plaintext and ciphertext in a chosen plaintext attack.

Categories:
15 Views

In this paper, we design a new open-set method to detect deepfakes that does not assume information about the techniques behind the deepfakes generation. Contrary to existing methods, which build upon known telltales left by the deepfake creation process, we assume no prior knowledge about the sample generation, thus presenting a method for blind deepfake detection, a necessary step toward true generalization.

Categories:
9 Views

Pages