ICASSP 2023

IEEE ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2023 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Coherent long-time integration and Bayesian detection with Bernoulli track-before-detect

We consider the problem of detecting small and manoeuvring objects with staring array radars. Coherent processing and long-time integration are key to addressing the undesirably low signal-to-noise/background conditions in this scenario and are complicated by the object manoeuvres. We propose a Bayesian solution that builds upon a Bernoulli state space model equipped with the likelihood of the radar data cubes through the radar ambiguity function. Likelihood evaluation in this model corresponds to coherent long-time integration.

SPL2022-23_CoherentTBDBernoulli_muney_2.1.pdf

Author's final copy. (319)

Categories:: Bayesian learning; Bayesian signal processing (MLR-BAYL)
Statistical Signal Processing
Sensor Array Processing

62 Views

ICASSP 2023 AUDITORY EEG DECODING CHALLENGE

Read more about ICASSP 2023 AUDITORY EEG DECODING CHALLENGE
Log in to post comments

auditory_eeg_decoding_challenge.pdf

auditory_eeg_decoding_challenge.pdf (246)

Categories:: Biomedical signal processing

153 Views

Gridless 3D Recovery of Image Sources from Room Impulse Responses

Read more about Gridless 3D Recovery of Image Sources from Room Impulse Responses
Log in to post comments

Given a sound field generated by a sparse distribution of impulse image sources, can the continuous 3D positions and amplitudes of these sources be recovered from discrete, band-limited measurements of the field at a finite set of locations, e.g. , a multichannel room impulse response? Borrowing from recent advances in super-resolution imaging, it is shown that this non-linear, non-convex inverse problem can be efficiently relaxed into a convex linear inverse problem over the space of Radon measures in R^3 .

poster.pdf

poster.pdf (290)

presentation (1).pdf

Presentation slides (253)

Categories:: Room Acoustics and Acoustic System Modeling
Spatial and Multichannel Audio

24 Views

Poster

Read more about Poster
Log in to post comments

Listening to spoken content often requires modifying the speech rate while preserving the timbre and pitch of the speaker. To date, advanced signal processing techniques are used to address this task, but it still remains a challenge to maintain a high speech quality at all time-scales. Inspired by the success of speech generation using Generative Adversarial Networks (GANs), we propose a novel unsupervised learning algorithm for time-scale modification (TSM) of speech, called ScalerGAN. The model is trained using a set of speech utterances, where no time-scales are provided.

scalerGAN ICASSP 2023 poster.pdf

scalerGAN ICASSP 2023 poster.pdf (221)

Categories:: Speech Synthesis and Generation, including TTS (SPE-SYNT)

29 Views

Image source method based on the directional impulse responses

Read more about Image source method based on the directional impulse responses
Log in to post comments

This paper presents the image source method for simulating the observed signals in the time-domain on the boundary of a spherical listening region. A wideband approach is used where all derivations are in the time-domain. The source emits a sequence of spherical wave fronts whose amplitudes could be related to the far-field directional impulse responses of a loudspeaker. Geometric methods are extensively used to model the observed signals. The spherical harmonic coefficients of the observed signals are also derived.

ICASSP_2023_Poster_v0_CMYK.pdf

Poster (213)

ISM ICASSP Video Presentation Jiarui Wang.pdf

Video presentation slides (204)

Categories:: Auditory Modeling and Hearing Aids

26 Views

Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation

Read more about Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
Log in to post comments

Audio Spectrogram Transformer models rule the field of Audio Tagging, outrunning previously dominating Convolutional Neural Networks (CNNs). Their superiority is based on the ability to scale up and exploit large-scale datasets such as AudioSet. However, Transformers are demanding in terms of model size and computational requirements compared to CNNs. We propose a training procedure for efficient CNNs based on offline Knowledge Distillation (KD) from high-performing yet complex transformers.

Efficient Large-scale Audio Tagging via Transformer-to-CNN%0AKnowledge Distillation.pdf

Efficient Large-scale Audio Tagging via Transformer-to-CNN%0AKnowledge Distillation.pdf (254)

Categories:: Audio Processing Systems

32 Views

Supervised Hierarchical Clustering Using Graph Neural Networks For Speaker Diarization

PosterLandscape_paper_1883_final.pdf

PosterLandscape_paper_1883_final.pdf (309)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

324 Views

STATISTICAL ANALYSIS OF SPEECH DISORDER SPECIFIC FEATURES TO CHARACTERISE DYSARTHRIA SEVERITY LEVEL

Poor coordination of the speech production subsystems due to any neurological injury or a neuro-degenerative disease leads to dysarthria, a neuro-motor speech disorder. Dysarthric

joshy_icassp_ppt.pdf

Presentation slides (196)

Categories:: Other

31 Views

Wave-U-Net Discriminator: Fast and Lightweight Discriminator for Generative Adversarial Network-Based Speech Synthesis

This study proposes a Wave-U-Net discriminator, which is a single but expressive discriminator that assesses a waveform in a sample-wise manner with the same resolution as the input signal while extracting multilevel features via an encoder and decoder with skip connections. The experimental results demonstrate that a Wave-U-Net discriminator can be used as an alternative to a typical ensemble of discriminators while maintaining speech quality, reducing the model size, and accelerating the training speed.