Spatial and Multichannel Audio

Natural Sound Rendering for Headphones: Integration of signal processing techniques (slides)

With the strong growth of assistive and personal listening devices, natural sound rendering over headphones is becoming a necessity for prolonged listening in multimedia and virtual reality applications. The aim of natural sound rendering is to naturally recreate the sound scenes with the spatial and timbral quality as natural as possible, so as to achieve a truly immersive listening experience. However, rendering natural sound over headphones encounters many challenges. This tutorial article presents signal processing techniques to tackle these challenges to assist human listening.

SPM15slides_Natural Sound Rendering for Headphones.pdf

SPM15slides_Natural Sound Rendering for Headphones.pdf (112)

Categories:: Spatial and Multichannel Audio
Audio and Acoustic Signal Processing

96 Views

ENHANCING LOW-LATENCY SPEAKER DIARIZATION WITH SPATIAL DICTIONARY LEARNING

Read more about ENHANCING LOW-LATENCY SPEAKER DIARIZATION WITH SPATIAL DICTIONARY LEARNING
Log in to post comments

This study proposes a low-latency online speaker diarization framework.
Specifically, we design a spatial dictionary learning module shared across different frequency bands, enabling spatial feature learning at each frequency bin.
This contributes to reducing the latency constraints of the online diarization system.
Additionally, a magnitude-weighted fusion is devised to integrate spectral features. Consequently, the system can extract discriminative speaker embeddings by simultaneously considering spectral and spatial features.

ENHANCING LOW-LATENCY SPEAKER DIARIZATION WITH SPATIAL DICTIONARY LEARNING.pdf

ENHANCING LOW-LATENCY SPEAKER DIARIZATION WITH SPATIAL DICTIONARY LEARNING.pdf (237)

Categories:: Spatial and Multichannel Audio

61 Views

Localizing Acoustic Energy in Sound Field Synthesis by Weighted Exterior Radiation Suppression

A method for synthesizing the desired sound field while suppressing the exterior radiation power with directional weighting is proposed. The exterior radiation from the loudspeakers in sound field synthesis systems can be problematic in practical situations. Although several methods to suppress the exterior radiation have been proposed, suppression in all outward directions is generally difficult, especially when the number of loudspeakers is not sufficiently large.

icassp2024_koyama.pdf

icassp2024_koyama.pdf (237)

Categories:: Spatial and Multichannel Audio

81 Views

A STUDY OF MULTICHANNEL SPATIOTEMPORAL FEATURES AND KNOWLEDGE DISTILLATION ON ROBUST TARGET SPEAKER EXTRACTION

Target speaker extraction (TSE) based on direction of arrival (DOA) has a wide range of applications in e.g., remote conferencing, hearing aids, in-car speech interaction. Due to the inherent phase uncertainty, existing TSE methods usually suffer from speaker confusion within specific frequency bands. Imprecise DOA measurements caused by e.g., the calibration of the microphone array and ambient noises, can also deteriorate the TSE performance.

wang-poster.pdf

wang-poster.pdf (250)

Categories:: Spatial and Multichannel Audio
Source Separation and Signal Enhancement

32 Views

Spatial Scaper: A Library to Simulate and Augment Soundscapes for Sound Event Localization and Detection in Realistic Rooms

Sound event localization and detection (SELD) is an important task in machine listening.
Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels.
SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape.
However, RIRs require manual collection in specific rooms.
We present SpatialScaper, a library for SELD data simulation and augmentation.

ICASSP_2024_Spatial_Scaper_Poster-FINAL.pdf

icassp2024 spatial_scaper poster (232)

Categories:: Spatial and Multichannel Audio
Room Acoustics and Acoustic System Modeling

24 Views

SpatialCodec: Neural Spatial Speech Coding

Read more about SpatialCodec: Neural Spatial Speech Coding
Log in to post comments

In this work, we address the challenge of encoding speech captured by a microphone array using deep learning techniques with the aim of preserving and accurately reconstructing crucial spatial cues embedded in multi-channel recordings. We propose a neural spatial audio coding framework that achieves a high compression ratio, leveraging single-channel neural sub-band codec and SpatialCodec.

SpatialCodec_Poster.pptx

SpatialCodec_Poster.pptx (252)

Categories:: Multi-channel Signal Processing
Spatial and Multichannel Audio
Speech Coding (SPE-CODI)

74 Views

The R3VIVAL Dataset: Repository of room responses and 360 videos of a variable acoustics lab

This paper presents a dataset of spatial room impulse responses (SRIRs) and 360° stereoscopic video captures of a variable acoustics laboratory. A total of 34 source positions are measured with 8 different acoustic panel configurations, resulting in a total of 272 SRIRs. The source positions are arranged in 30° increments at concentric circles of radius 1.5, 2, and 3 m measured with a directional studio monitor, as well as 4 extra positions at the room corners measured with an omnidirectional source.

Poster.pdf

Poster (329)

ICASSP2023_R3VIVAL_Manuscript.pdf

Paper (255)

Categories:: Room Acoustics and Acoustic System Modeling
Spatial and Multichannel Audio

34 Views

Classifying Non-Individual Head-Related Transfer Functions with A Computational Auditory Model: Calibration And Metrics

This study explores the use of a multi-feature Bayesian auditory sound localisation model to classify non-individual head-related transfer functions (HRTFs). Based on predicted sound localisation performance, these are grouped into ‘good’ and ‘bad’, and the ‘best’/‘worst’ is selected from each category. Firstly, we present a greedy algorithm for automated individual calibration of the model based on the individual sound localisation data.

2023056019.pdf

ICASSP 2023 paper pre-print (271)

Categories:: Spatial and Multichannel Audio
Auditory Modeling and Hearing Aids

24 Views

ROBUST BINAURAL SOUND LOCALISATION WITH TEMPORAL ATTENTION

Read more about ROBUST BINAURAL SOUND LOCALISATION WITH TEMPORAL ATTENTION
Log in to post comments

Despite there being clear evidence for attentional effects in biological spatial hearing, relatively few machine hearing systems exploit attention in binaural sound localisation. This paper addresses this issue by proposing a novel binaural machine hearing system with temporal attention for robust localisation of sound sources in noisy and reverberant conditions. A convolutional neural network is employed to extract noise-robust localisation features, which are similar to interaural phase difference, directly from phase spectra of the left and right ears for each frame.

ICASSP23_Poster__paper4227.pdf

ICASSP23_Poster (351)

ICASSP23_Robust_Binaural_Sound_Localisation_with_Temporal_Attention.pdf

ICASSP23_Paper (293)

ICASSP23_Slides_paper4227.pdf

ICASSP23_Slides (286)

Categories:: Room Acoustics and Acoustic System Modeling
Spatial and Multichannel Audio
Auditory Modeling and Hearing Aids

54 Views

Gridless 3D Recovery of Image Sources from Room Impulse Responses

Read more about Gridless 3D Recovery of Image Sources from Room Impulse Responses
Log in to post comments

Given a sound field generated by a sparse distribution of impulse image sources, can the continuous 3D positions and amplitudes of these sources be recovered from discrete, band-limited measurements of the field at a finite set of locations, e.g. , a multichannel room impulse response? Borrowing from recent advances in super-resolution imaging, it is shown that this non-linear, non-convex inverse problem can be efficiently relaxed into a convex linear inverse problem over the space of Radon measures in R^3 .