Sorry, you need to enable JavaScript to visit this website.

With the strong growth of assistive and personal listening devices, natural sound rendering over headphones is becoming a necessity for prolonged listening in multimedia and virtual reality applications. The aim of natural sound rendering is to naturally recreate the sound scenes with the spatial and timbral quality as natural as possible, so as to achieve a truly immersive listening experience. However, rendering natural sound over headphones encounters many challenges. This tutorial article presents signal processing techniques to tackle these challenges to assist human listening.


We would like to present Nkululeko, a template based system that lets users perform machine learning experiments in the speaker characteristics domain. It is mainly targeted on users not being familiar with machine learning, or computer programming at all, to being used as a teaching tool or a simple entry level tool to the field of artificial intelligence.


We discuss the influence of random splicing on the perception of emotional expression in speech signals.
Random splicing is the randomized reconstruction of short audio snippets with the aim to obfuscate the speech contents.
A part of the German parliament recordings has been random spliced and both versions -- the original and the scrambled ones -- manually labeled with respect to the arousal, valence and dominance dimensions.
Additionally, we run a state-of-the-art transformer-based pre-trained emotional model on the data.


In recent years, prototypical networks have been widely used
in many few-shot learning scenarios. However, as a metric-
based learning method, their performance often degrades in
the presence of bad or noisy embedded features, and outliers
in support instances. In this paper, we introduce a hybrid at-
tention module and combine it with prototypical networks for
few-shot sound classification. This hybrid attention module
consists of two blocks: a feature-level attention block, and


Representation learning from unlabeled data has been of major interest in artificial intelligence research. While self-supervised speech representation learning has been popular in the speech research community, very few works have comprehensively analyzed audio representation learning for non-speech audio tasks. In this paper, we propose a self-supervised audio representation learning method and apply it to a variety of downstream non-speech audio tasks.


Measuring personal head-related transfer functions (HRTFs) is essential in binaural audio. Personal HRTFs are not only required for binaural rendering and for loudspeaker-based binaural reproduction using crosstalk cancellation, but they also serve as a basis for data-driven HRTF individualization techniques and psychoacoustic experiments. Although many attempts have been made to expedite HRTF measurements, the rotational velocities in today’s measurement systems remain lower than those in natural head movements.


Traditionally, the quality of acoustic echo cancellers is evaluated using intrusive speech quality assessment measures such as ERLE \cite{g168} and PESQ \cite{p862}, or by carrying out subjective laboratory tests. Unfortunately, the former are not well correlated with human subjective measures, while the latter are time and resource consuming to carry out. We provide a new tool for speech quality assessment for echo impairment which can be used to evaluate the performance of acoustic echo cancellers.