Sorry, you need to enable JavaScript to visit this website.

A method of interpolating the acoustic transfer function (ATF) between regions that takes into account both the physical properties of the ATF and the directionality of region configurations is proposed. Most spatial ATF interpolation methods are limited to estimation in the region of receivers. A kernel method for region-to-region ATF interpolation makes it possible to estimate the ATFs for both source and receiver regions from a discrete set of ATF measurements.


A model of a room impulse response (RIR) is useful for a wide range of applications. Typically, the early part of an RIR is sparse, and its sparse structure allows for accurate and simple modeling of the RIR. The existing L-p (0 < p ≤ 1)-norm-based methods suffer from the sensitivity to the user-selected regularization parameters or a high computational burden. In this work, we propose to reconstruct the sparse model for the early part of RIRs with sparse Bayesian learning (SBL).


The mixing matrix of a Feedback Delay Network (FDN) reverberator is used to control the mixing time and echo density profile. In this work, we investigate the effect of the mixing matrix on the modes (poles) of the FDN with the goal of using this information to better design the various FDN parameters. We find the modal decomposition of delay network reverberators using a state space formulation, showing how modes of the system can be extracted by eigenvalue decomposition of the state transition matrix.


Reverberation time, T60, directly influences the amount of reverberation
in a signal, and its direct estimation may help with
dereverberation. Traditionally, T60 estimation has been done
using signal processing or probabilistic approaches, until recently
where deep-learning approaches have been developed.
Unfortunately, the appropriate loss function for training the
network has not been adequately determined. In this paper,
we propose a composite classification- and regression-based


Ensuring performance robustness for a variety of situations that can occur in real-world environments is one of the challenging tasks in sound event classification. One of the unpredictable and detrimental factors in performance, especially in indoor environments, is reverberation. To alleviate this problem, we propose a conditioning method that provides room impulse response (RIR) information to help the network become less sensitive to environmental information and focus on classifying the desired sound.


The goal of acoustic matching is to transform an audio recording made in one acoustic environment to sound as if it had been recorded in a different environment, based on reference audio from the target environment. This paper introduces a deep learning solution for two parts of the acoustic matching problem. First, we characterize acoustic environments by mapping audio into a low-dimensional embedding invariant to speech content and speaker identity.


Acoustic echoes retrieval is a research topic that is gaining importance in many speech and audio signal processing applications such as speech enhancement, source separation, dereverberation and room geometry estimation. This work proposes a novel approach to blindly retrieve the off-grid timing of early acoustic echoes from a stereophonic recording of an unknown sound source such as speech. It builds on the recent framework of continuous dictionaries.


Recent work on acoustic parameter estimation indicates that geometric room volume can be useful for modeling the character of an acoustic environment. However, estimating volume from audio signals remains a challenging problem. Here we propose using a convolutional neural network model to estimate the room volume blindly from reverberant single-channel speech signals in the presence of noise. The model is shown to produce estimates within approximately a factor of two to the true value, for rooms ranging in size from small offices to large concert halls.


It is commonly observed that acoustic echoes hurt perfor-mance of sound source localization (SSL) methods. We in-troduce the concept of microphone array augmentation withechoes (MIRAGE) and show how estimation of early-echocharacteristics can in fact benefit SSL. We propose a learning-based scheme for echo estimation combined with a physics-based scheme for echo aggregation.