Sorry, you need to enable JavaScript to visit this website.

The mixing matrix of a Feedback Delay Network (FDN) reverberator is used to control the mixing time and echo density profile. In this work, we investigate the effect of the mixing matrix on the modes (poles) of the FDN with the goal of using this information to better design the various FDN parameters. We find the modal decomposition of delay network reverberators using a state space formulation, showing how modes of the system can be extracted by eigenvalue decomposition of the state transition matrix.


Reverberation time, T60, directly influences the amount of reverberation
in a signal, and its direct estimation may help with
dereverberation. Traditionally, T60 estimation has been done
using signal processing or probabilistic approaches, until recently
where deep-learning approaches have been developed.
Unfortunately, the appropriate loss function for training the
network has not been adequately determined. In this paper,
we propose a composite classification- and regression-based


Ensuring performance robustness for a variety of situations that can occur in real-world environments is one of the challenging tasks in sound event classification. One of the unpredictable and detrimental factors in performance, especially in indoor environments, is reverberation. To alleviate this problem, we propose a conditioning method that provides room impulse response (RIR) information to help the network become less sensitive to environmental information and focus on classifying the desired sound.


The goal of acoustic matching is to transform an audio recording made in one acoustic environment to sound as if it had been recorded in a different environment, based on reference audio from the target environment. This paper introduces a deep learning solution for two parts of the acoustic matching problem. First, we characterize acoustic environments by mapping audio into a low-dimensional embedding invariant to speech content and speaker identity.


Acoustic echoes retrieval is a research topic that is gaining importance in many speech and audio signal processing applications such as speech enhancement, source separation, dereverberation and room geometry estimation. This work proposes a novel approach to blindly retrieve the off-grid timing of early acoustic echoes from a stereophonic recording of an unknown sound source such as speech. It builds on the recent framework of continuous dictionaries.


Recent work on acoustic parameter estimation indicates that geometric room volume can be useful for modeling the character of an acoustic environment. However, estimating volume from audio signals remains a challenging problem. Here we propose using a convolutional neural network model to estimate the room volume blindly from reverberant single-channel speech signals in the presence of noise. The model is shown to produce estimates within approximately a factor of two to the true value, for rooms ranging in size from small offices to large concert halls.


It is commonly observed that acoustic echoes hurt perfor-mance of sound source localization (SSL) methods. We in-troduce the concept of microphone array augmentation withechoes (MIRAGE) and show how estimation of early-echocharacteristics can in fact benefit SSL. We propose a learning-based scheme for echo estimation combined with a physics-based scheme for echo aggregation.


The localization of acoustic sound sources is beneficial to signal processing applications of speech enhancement, dereverberation, separation and tracking. Difficulties in position estimation arise in real world environments due to coherent reflections degrading performance of subspace localization techniques. This paper proposes a method of multiple signal classification (MUSIC) subspace localization, which is suitable for reverberant rooms. The method is based on the modal decomposition of a room's region-to-region transfer function, which is assumed to be known.


Acoustical behavior of a room for a given position of microphone and sound source is usually described using the room impulse response. If we rely on the standard uniform sampling, the estimation of room impulse response for arbitrary positions in the room requires a large number of measurements. In order to lower the required sampling rate, some solutions have emerged that exploit the sparse representation of the room wavefield in the terms of plane waves in the low-frequency domain. The plane wave representation has a simple form in rectangular rooms.