Sorry, you need to enable JavaScript to visit this website.

IEEE ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2023 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Given a sound field generated by a sparse distribution of impulse image sources, can the continuous 3D positions and amplitudes of these sources be recovered from discrete, band-limited measurements of the field at a finite set of locations, e.g. , a multichannel room impulse response? Borrowing from recent advances in super-resolution imaging, it is shown that this non-linear, non-convex inverse problem can be efficiently relaxed into a convex linear inverse problem over the space of Radon measures in R^3 .

Categories:
16 Views

Listening to spoken content often requires modifying the speech rate while preserving the timbre and pitch of the speaker. To date, advanced signal processing techniques are used to address this task, but it still remains a challenge to maintain a high speech quality at all time-scales. Inspired by the success of speech generation using Generative Adversarial Networks (GANs), we propose a novel unsupervised learning algorithm for time-scale modification (TSM) of speech, called ScalerGAN. The model is trained using a set of speech utterances, where no time-scales are provided.

Categories:
25 Views

This paper presents the image source method for simulating the observed signals in the time-domain on the boundary of a spherical listening region. A wideband approach is used where all derivations are in the time-domain. The source emits a sequence of spherical wave fronts whose amplitudes could be related to the far-field directional impulse responses of a loudspeaker. Geometric methods are extensively used to model the observed signals. The spherical harmonic coefficients of the observed signals are also derived.

Categories:
20 Views

Audio Spectrogram Transformer models rule the field of Audio Tagging, outrunning previously dominating Convolutional Neural Networks (CNNs). Their superiority is based on the ability to scale up and exploit large-scale datasets such as AudioSet. However, Transformers are demanding in terms of model size and computational requirements compared to CNNs. We propose a training procedure for efficient CNNs based on offline Knowledge Distillation (KD) from high-performing yet complex transformers.

Categories:
21 Views

Poor coordination of the speech production subsystems due to any neurological injury or a neuro-degenerative disease leads to dysarthria, a neuro-motor speech disorder. Dysarthric

Categories:
15 Views

This study proposes a Wave-U-Net discriminator, which is a single but expressive discriminator that assesses a waveform in a sample-wise manner with the same resolution as the input signal while extracting multilevel features via an encoder and decoder with skip connections. The experimental results demonstrate that a Wave-U-Net discriminator can be used as an alternative to a typical ensemble of discriminators while maintaining speech quality, reducing the model size, and accelerating the training speed.

Categories:
23 Views

Recognizing signs in virtual reality (VR) is challenging; here, we developed an American Sign Language (ASL) recognition system in a VR environment. We collected a dataset of 2,500 ASL numerical digits (0-10) and 500 instances of the ASL sign for TEA from 10 participants using an Oculus Quest 2. Participants produced ASL signs naturally, resulting in significant variability in location, orientation, duration, and motion trajectory. Additionally, the ten signers in this initial study were diverse in age, sex, ASL proficiency, and hearing status, with most being deaf lifelong ASL users.

Categories:
17 Views

Recognizing signs in virtual reality (VR) is challenging; here, we developed an American Sign Language (ASL) recognition system in a VR environment. We collected a dataset of 2,500 ASL numerical digits (0-10) and 500 instances of the ASL sign for TEA from 10 participants using an Oculus Quest 2. Participants produced ASL signs naturally, resulting in significant variability in location, orientation, duration, and motion trajectory. Additionally, the ten signers in this initial study were diverse in age, sex, ASL proficiency, and hearing status, with most being deaf lifelong ASL users.

Categories:
10 Views

Pages