Sorry, you need to enable JavaScript to visit this website.

With the strong growth of assistive and personal listening devices, natural sound rendering over headphones is becoming a necessity for prolonged listening in multimedia and virtual reality applications. The aim of natural sound rendering is to naturally recreate the sound scenes with the spatial and timbral quality as natural as possible, so as to achieve a truly immersive listening experience. However, rendering natural sound over headphones encounters many challenges. This tutorial article presents signal processing techniques to tackle these challenges to assist human listening.

Categories:
77 Views

This study proposes a low-latency online speaker diarization framework.
Specifically, we design a spatial dictionary learning module shared across different frequency bands, enabling spatial feature learning at each frequency bin.
This contributes to reducing the latency constraints of the online diarization system.
Additionally, a magnitude-weighted fusion is devised to integrate spectral features. Consequently, the system can extract discriminative speaker embeddings by simultaneously considering spectral and spatial features.

Categories:
3 Views

A method for synthesizing the desired sound field while suppressing the exterior radiation power with directional weighting is proposed. The exterior radiation from the loudspeakers in sound field synthesis systems can be problematic in practical situations. Although several methods to suppress the exterior radiation have been proposed, suppression in all outward directions is generally difficult, especially when the number of loudspeakers is not sufficiently large.

Categories:
5 Views

Target speaker extraction (TSE) based on direction of arrival (DOA) has a wide range of applications in e.g., remote conferencing, hearing aids, in-car speech interaction. Due to the inherent phase uncertainty, existing TSE methods usually suffer from speaker confusion within specific frequency bands. Imprecise DOA measurements caused by e.g., the calibration of the microphone array and ambient noises, can also deteriorate the TSE performance.

Categories:
3 Views

Sound event localization and detection (SELD) is an important task in machine listening.
Major advancements rely on simulated data with sound events in specific rooms and strong spatio-temporal labels.
SELD data is simulated by convolving spatialy-localized room impulse responses (RIRs) with sound waveforms to place sound events in a soundscape.
However, RIRs require manual collection in specific rooms.
We present SpatialScaper, a library for SELD data simulation and augmentation.

Categories:
5 Views

In this work, we address the challenge of encoding speech captured by a microphone array using deep learning techniques with the aim of preserving and accurately reconstructing crucial spatial cues embedded in multi-channel recordings. We propose a neural spatial audio coding framework that achieves a high compression ratio, leveraging single-channel neural sub-band codec and SpatialCodec.

Categories:
57 Views

This paper presents a dataset of spatial room impulse responses (SRIRs) and 360° stereoscopic video captures of a variable acoustics laboratory. A total of 34 source positions are measured with 8 different acoustic panel configurations, resulting in a total of 272 SRIRs. The source positions are arranged in 30° increments at concentric circles of radius 1.5, 2, and 3 m measured with a directional studio monitor, as well as 4 extra positions at the room corners measured with an omnidirectional source.

Categories:
19 Views

This study explores the use of a multi-feature Bayesian auditory sound localisation model to classify non-individual head-related transfer functions (HRTFs). Based on predicted sound localisation performance, these are grouped into ‘good’ and ‘bad’, and the ‘best’/‘worst’ is selected from each category. Firstly, we present a greedy algorithm for automated individual calibration of the model based on the individual sound localisation data.

Categories:
17 Views

Despite there being clear evidence for attentional effects in biological spatial hearing, relatively few machine hearing systems exploit attention in binaural sound localisation. This paper addresses this issue by proposing a novel binaural machine hearing system with temporal attention for robust localisation of sound sources in noisy and reverberant conditions. A convolutional neural network is employed to extract noise-robust localisation features, which are similar to interaural phase difference, directly from phase spectra of the left and right ears for each frame.

Categories:
40 Views

Given a sound field generated by a sparse distribution of impulse image sources, can the continuous 3D positions and amplitudes of these sources be recovered from discrete, band-limited measurements of the field at a finite set of locations, e.g. , a multichannel room impulse response? Borrowing from recent advances in super-resolution imaging, it is shown that this non-linear, non-convex inverse problem can be efficiently relaxed into a convex linear inverse problem over the space of Radon measures in R^3 .

Categories:
16 Views

Pages