ICASSP 2023

IEEE ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2023 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

AERO: Audio Super Resolution in the Spectral Domain

Read more about AERO: Audio Super Resolution in the Spectral Domain
Log in to post comments

ICASSP_presentation.pdf

ICASSP_presentation.pdf (138)

Categories:: Audio Processing Systems

55 Views

PASSIVE ACOUSTIC TRACKING OF WHALES IN 3-D

Read more about PASSIVE ACOUSTIC TRACKING OF WHALES IN 3-D
Log in to post comments

Passive acoustic monitoring (PAM) is a nonintrusive approach to studying behaviors of vocalizing marine organisms underwater that otherwise would remain unexplored. In this paper, we propose a data processing chain that can detect and track multiple whales in 3-D from passively recorded underwater acoustic signals. In particular, time-difference-of-arrival (TDOA) measurements of echolocation clicks are extracted from a volumetric hydrophone array's acoustic data by using a noise-whitening cross-correlation.

ICASSP23-Jang-Poster-V2.pdf

ICASSP23-Jang-Poster-V2.pdf (131)

Categories:: Applications of Sensor Array and Multi-channel Signal Processing

65 Views

Exploring Approaches to Multi-Task Automatic Synthesizer Programming

Read more about Exploring Approaches to Multi-Task Automatic Synthesizer Programming
Log in to post comments

Automatic Synthesizer Programming is the task of transforming an audio signal that was generated from a virtual instrument, into the parameters of a sound synthesizer that would generate this signal. In the past, this could only be done for one virtual instrument. In this paper, we expand the current literature by exploring approaches to automatic synthesizer programming for multiple virtual instruments. Two different approaches to multi-task automatic synthesizer programming are presented. We find that the joint-decoder approach performs best.

ICASSP_POSTER.pdf

ICASSP_POSTER.pdf (353)

Categories:: Audio Analysis and Synthesis

69 Views

Articulation GAN: Unsupervised Modeling of Articulatory Learning

Read more about Articulation GAN: Unsupervised Modeling of Articulatory Learning
Log in to post comments

Generative deep neural networks are widely used for speech synthesis, but most existing models directly generate waveforms or spectral outputs. Humans, however, produce speech by controlling articulators, which results in the production of speech sounds through physical properties of sound propagation. We introduce the Articulatory Generator to the Generative Adversarial Network paradigm, a new unsupervised generative model of speech production/synthesis.

Begus Zhou Wu Anumanchipalli 5406 Articulation GAN ICASSP 2023.pdf

Begus Zhou Wu Anumanchipalli 5406 Articulation GAN ICASSP 2023.pdf (109)

Categories:: Speech Production (SPE-SPRD)
Speech Synthesis and Generation, including TTS (SPE-SYNT)
Human Spoken Language Acquisition, Development and Learning (SLP-LADL)
Language Modeling, for Speech and SLP (SLP-LANG)
Bioacoustics and Medical Acoustics

45 Views

Multi-dimensional Signal Recovery Using Low-rank Deconvolution

Read more about Multi-dimensional Signal Recovery Using Low-rank Deconvolution
Log in to post comments

In this work we present Low-rank Deconvolution, a powerful framework for low-level feature-map learning for efficient signal representation with application to signal recovery. Its formulation in multi-linear algebra inherits properties from convolutional sparse coding and low-rank approximation methods as in this setting signals are decomposed in a set of filters convolved with a set of low-rank tensors. We show its advantages by learning compressed video representations and solving image in-painting problems.

ReixachICASSP2023-Poster-A0.pdf

Poster (125)

ReixachICASSP2023.pdf

Pre-print (104)

Categories:: Image/Video Processing

87 Views

MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning

Read more about MMCosine: Multi-Modal Cosine Loss Towards Balanced Audio-Visual Fine-Grained Learning
Log in to post comments

Audio-visual learning helps to comprehensively understand the world by fusing practical information from multiple modalities. However, recent studies show that the imbalanced optimization of uni-modal encoders in a joint-learning model is a bottleneck to enhancing the model`s performance. We further find that the up-to-date imbalance-mitigating methods fail on some audio-visual fine-grained tasks, which have a higher demand for distinguishable feature distribution.

2502.zip

2502.zip (66)

Categories:: Image/Video Processing

14 Views

Biologically-Inspired Continual Learning of Human Motion Sequences

Read more about Biologically-Inspired Continual Learning of Human Motion Sequences
Log in to post comments

This work proposes a model for continual learning on tasks involving temporal sequences, specifically, human motions. It improves on a recently proposed brain-inspired replay model (BI-R) by building a biologically-inspired conditional temporal variational autoencoder (BI-CTVAE), which instantiates a latent mixture-of-Gaussians for class representation. We investigate a novel continual-learning-to-generate (CL2Gen) scenario where the model generates motion sequences of different classes. The generative accuracy of the model is tested over a set of tasks.

2023_ICASSP_OTTJ_Poster v2.3 vector.pdf

Poster for Biologically-Inspired Continual Learning of Human Motion Sequences paper (126)

Categories:: Sequential learning; sequential decision methods (MLR-SLER)

41 Views

Real-time perceptually motivated neural network for echo control and noise reduction

Read more about Real-time perceptually motivated neural network for echo control and noise reduction
Log in to post comments

Echo and background noise are the major obstacles in today’s user sound experience for devices like a speakerphone or video bar. We propose real-time perceptually motivated neural network-based echo control and noise reduction. The demonstrated method relies on a linear acoustic echo canceller (LAEC) combined with a neural network as a post-filter which incorporates perceptual mapping in both feature representation and loss function. The proposed method relies on mic and far-end signals for the LAEC stage, while the LAEC output, mic and echo estimate are inputs to the post-filter.

Poster Real-Time Perceptually Motivated Neural Network for Deep Echo Suppression ICASSP 2023 landscape.pdf

Poster Real-Time Perceptually Motivated Neural Network for Deep Echo Suppression ICASSP 2023 landscape.pdf (139)

Categories:: Speech Enhancement (SPE-ENHA)

96 Views

Deep Fusion of Multi-Object Densities Using Transformer

Read more about Deep Fusion of Multi-Object Densities Using Transformer
Log in to post comments

The fusion of multiple probability densities has important applications in many fields, including, for example, multi-sensor signal pro- cessing, robotics, and smart environments. In this paper, we demonstrate that deep learning-based methods can be used to fuse multi-object densities. Given a scenario with several sensors with possibly different field-of-views, tracking is performed locally in each sensor by a tracker, which produces random finite set multi-object densities.

ICASSP2023Poster.pdf

ICASSP2023Poster.pdf (99)

Categories:: Sensor and Relay Networks

21 Views

Jazznet: A Dataset of Fundamental Piano Patterns for Music Audio Machine Learning Research

The paper introduces the jazznet Dataset, a dataset of fundamental jazz piano music patterns for developing machine learning (ML) algorithms in music information retrieval (MIR). The dataset contains 162520 labeled piano patterns, including chords, arpeggios, scales, and chord progressions with their inversions, resulting in more than 26k hours of audio and a total size of 95GB.

jazznetPoster.pdf

Poster (100)

Categories:: Applications in Music and Audio Processing (MLR-MUSI)

19 Views

Pages