ICASSP 2023

IEEE ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2023 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Deep Fusion of Multi-Object Densities Using Transformer

Read more about Deep Fusion of Multi-Object Densities Using Transformer
Log in to post comments

The fusion of multiple probability densities has important applications in many fields, including, for example, multi-sensor signal pro- cessing, robotics, and smart environments. In this paper, we demonstrate that deep learning-based methods can be used to fuse multi-object densities. Given a scenario with several sensors with possibly different field-of-views, tracking is performed locally in each sensor by a tracker, which produces random finite set multi-object densities.

ICASSP2023Poster.pdf

ICASSP2023Poster.pdf (224)

Categories:: Sensor and Relay Networks

25 Views

Jazznet: A Dataset of Fundamental Piano Patterns for Music Audio Machine Learning Research

The paper introduces the jazznet Dataset, a dataset of fundamental jazz piano music patterns for developing machine learning (ML) algorithms in music information retrieval (MIR). The dataset contains 162520 labeled piano patterns, including chords, arpeggios, scales, and chord progressions with their inversions, resulting in more than 26k hours of audio and a total size of 95GB.

jazznetPoster.pdf

Poster (271)

Categories:: Applications in Music and Audio Processing (MLR-MUSI)

27 Views

The Secret Source : Incorporating Source Features to Improve Acoustic-To-Articulatory Speech Inversion

In this work, we incorporated acoustically derived source features, aperiodicity, periodicity and pitch as additional targets to an acoustic-to-articulatory speech inversion (SI) system. We also propose a Temporal Convolution based SI system, which uses auditory spectrograms as the input speech representation, to learn long-range dependencies and complex interactions between the source and vocal tract, to improve the SI task.

The_Secret_Source__Incorporating_Source_Features_to_Improve_Acoustic-To-Articulatory_Speech_Inversion.pdf

The Secret Source (157)

poster_ICASSP23_finalpptx_new.pdf

Poster (189)

Categories:: Speech Production (SPE-SPRD)

28 Views

In-Band Full-Duplex Solutions in the Paradigm of Integrated Sensing and Communication

Read more about In-Band Full-Duplex Solutions in the Paradigm of Integrated Sensing and Communication
1 comment
Log in to post comments

The paper discusses different aspects in favor of using in-band full-duplex frontends for integrated sensing and communication (ISAC), considered for deployment of future 5G/6G infrastructure. Possible scenarios for practical utilization of the technology are discussed with additional focus on self-interference cancellation issue. An possible system implementation on abstract level is presented for cellular communication scenario.

7096.pdf

7096.pdf (219)

Categories:: Communication Systems and Applications

68 Views

Large Dimensional Analysis of LS-SVM Transfer Learning (application on PolSAR)

Read more about Large Dimensional Analysis of LS-SVM Transfer Learning (application on PolSAR)
Log in to post comments

DozPoster-3.pdf

DozPoster-3.pdf (197)

Categories:: Learning theory and algorithms (MLR-LEAR)

13 Views

The R3VIVAL Dataset: Repository of room responses and 360 videos of a variable acoustics lab

This paper presents a dataset of spatial room impulse responses (SRIRs) and 360° stereoscopic video captures of a variable acoustics laboratory. A total of 34 source positions are measured with 8 different acoustic panel configurations, resulting in a total of 272 SRIRs. The source positions are arranged in 30° increments at concentric circles of radius 1.5, 2, and 3 m measured with a directional studio monitor, as well as 4 extra positions at the room corners measured with an omnidirectional source.

Poster.pdf

Poster (287)

ICASSP2023_R3VIVAL_Manuscript.pdf

Paper (229)

Categories:: Room Acoustics and Acoustic System Modeling
Spatial and Multichannel Audio

32 Views

WIFI-BASED ROBUST CHILD PRESENCE DETECTION FOR SMART CARS

Read more about WIFI-BASED ROBUST CHILD PRESENCE DETECTION FOR SMART CARS
Log in to post comments

In-car child presence detection (CPD) has gained worldwide attention due to increased child deaths reported yearly when they are left unattended in a car. Existing solutions usually require dedicated sensors and are being surpassed by WiFi-based CPD because the latter can provide broader coverage and can reuse the in-car WiFi devices. However, the existing WiFi-based CPD solutions are not robust and may suffer from miss detection due to the very weak breathing of a young child and high false alarms under unfavorable environmental conditions.

RobustCPD_Poster.pdf

RobustCPD_Poster.pdf (231)

Categories:: Applications of Sensor Array and Multi-channel Signal Processing

60 Views

Cochlear Decomposition: A Novel Bio-inspired Multiscale Analysis Framework

Read more about Cochlear Decomposition: A Novel Bio-inspired Multiscale Analysis Framework
Log in to post comments

Signal multiscale decomposition (SMD) is an effective analysis for
the identification of modal information in time-domain signals. So
far, various SMD approaches, such as the Multiresolution Wavelet
Transform (MWT), the Empirical Mode Decomposition (EMD), and
the Variational Mode Decomosition (VMD) have been proposed,
However, issues, such as mode mixing for signals with closelyspaced
modes, have been identified. To confront such problems, we
propose here a novel spatial auditory decomposition framework for

Icassp_poster_final.pdf

Icassp_poster_final.pdf (787)

Categories:: Auditory Modeling and Hearing Aids

64 Views

PAPER - Real-Time Multichannel Speech Separation And Enhancement Using A Beamspace-Domain-Based Lightweight CNN

The problems of speech separation and enhancement concern the extraction of the speech emitted by a target speaker when placed in a scenario where multiple interfering speakers or noise are present, respectively. A plethora of practical applications such as home assistants and teleconferencing require some sort of speech separation and enhancement pre-processing before applying Automatic Speech Recognition (ASR) systems. In the recent years, most techniques have focused on the application of deep learning to either time-frequency or time-domain representations of the input audio signals.

Olivieri et al. - REAL-TIME MULTICHANNEL SPEECH SEPARATION AND ENHAN.pdf

PAPER file (305)

Categories:: Source Separation and Signal Enhancement

40 Views

POSTER - Grad-CAM-Inspired Interpretation of Nearfield Acoustic Holography using Physics-Informed Explainable Neural Network

The interpretation and explanation of decision-making processes of neural networks are becoming a key factor in the deep learning field. Although several approaches have been presented for classification problems, the application to regression models needs to be further investigated. In this manuscript we propose a Grad-CAM-inspired approach for the visual explanation of neural network architecture for regression problems.

GRAD CAM approch for KHCNN - poster.pdf

POSTER file (207)

Categories:: Audio Analysis and Synthesis

43 Views

Pages