- Transducers
- Spatial and Multichannel Audio
- Source Separation and Signal Enhancement
- Room Acoustics and Acoustic System Modeling
- Network Audio
- Audio for Multimedia
- Audio Processing Systems
- Audio Coding
- Audio Analysis and Synthesis
- Active Noise Control
- Auditory Modeling and Hearing Aids
- Bioacoustics and Medical Acoustics
- Music Signal Processing
- Loudspeaker and Microphone Array Signal Processing
- Echo Cancellation
- Content-Based Audio Processing

- Read more about Natural Sound Rendering for Headphones: Integration of signal processing techniques (slides)
- Log in to post comments
With the strong growth of assistive and personal listening devices, natural sound rendering over headphones is becoming a necessity for prolonged listening in multimedia and virtual reality applications. The aim of natural sound rendering is to naturally recreate the sound scenes with the spatial and timbral quality as natural as possible, so as to achieve a truly immersive listening experience. However, rendering natural sound over headphones encounters many challenges. This tutorial article presents signal processing techniques to tackle these challenges to assist human listening.
- Categories:

- Read more about Nkululeko
- Log in to post comments
We would like to present Nkululeko, a template based system that lets users perform machine learning experiments in the speaker characteristics domain. It is mainly targeted on users not being familiar with machine learning, or computer programming at all, to being used as a teaching tool or a simple entry level tool to the field of artificial intelligence.
- Categories:

- Read more about Masking speech contents by random splicing: Is emotional expression preserved?
- Log in to post comments
We discuss the influence of random splicing on the perception of emotional expression in speech signals.
Random splicing is the randomized reconstruction of short audio snippets with the aim to obfuscate the speech contents.
A part of the German parliament recordings has been random spliced and both versions -- the original and the scrambled ones -- manually labeled with respect to the arousal, valence and dominance dimensions.
Additionally, we run a state-of-the-art transformer-based pre-trained emotional model on the data.
- Categories:

- Read more about HYBRID ATTENTION-BASED PROTOTYPICAL NETWORKS FOR FEW-SHOT SOUND CLASSIFICATION
- Log in to post comments
In recent years, prototypical networks have been widely used
in many few-shot learning scenarios. However, as a metric-
based learning method, their performance often degrades in
the presence of bad or noisy embedded features, and outliers
in support instances. In this paper, we introduce a hybrid at-
tention module and combine it with prototypical networks for
few-shot sound classification. This hybrid attention module
consists of two blocks: a feature-level attention block, and
- Categories:


- Read more about Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
- Log in to post comments
Representation learning from unlabeled data has been of major interest in artificial intelligence research. While self-supervised speech representation learning has been popular in the speech research community, very few works have comprehensively analyzed audio representation learning for non-speech audio tasks. In this paper, we propose a self-supervised audio representation learning method and apply it to a variety of downstream non-speech audio tasks.
- Categories:

- Read more about TOWARDS FASTER CONTINUOUS MULTI-CHANNEL HRTF MEASUREMENTS BASED ON LEARNING SYSTEM MODELS
- Log in to post comments
Measuring personal head-related transfer functions (HRTFs) is essential in binaural audio. Personal HRTFs are not only required for binaural rendering and for loudspeaker-based binaural reproduction using crosstalk cancellation, but they also serve as a basis for data-driven HRTF individualization techniques and psychoacoustic experiments. Although many attempts have been made to expedite HRTF measurements, the rotational velocities in today’s measurement systems remain lower than those in natural head movements.
- Categories:

- Read more about AECMOS: A speech quality assessment metric for echo impairment
- Log in to post comments
Traditionally, the quality of acoustic echo cancellers is evaluated using intrusive speech quality assessment measures such as ERLE \cite{g168} and PESQ \cite{p862}, or by carrying out subjective laboratory tests. Unfortunately, the former are not well correlated with human subjective measures, while the latter are time and resource consuming to carry out. We provide a new tool for speech quality assessment for echo impairment which can be used to evaluate the performance of acoustic echo cancellers.
- Categories: