Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2020 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

This paper presents a domain adaptation model for sound event detection. A common challenge for sound event detection is how to deal with the mismatch among different datasets. Typically, the performance of a model will decrease if it is tested on a dataset which is different from the one that the model is trained on. To address this problem, based on convolutional recurrent neural networks (CRNNs), we propose an adapted CRNN (A-CRNN) as an unsupervised adversarial domain adaptation model for sound event detection.


Most parametric fundamental frequency estimators make the implicit assumption that any corrupting noise is additive, white Gaussian. Under this assumption, the maximum likelihood (ML) and the least squares estimators are the same, and statistically efficient. However, in the coloured noise case, the estimators differ, and the spectral shape of the corrupting noise should be taken into account.


There is a growing research interest in proposing new techniques to detect and exploit signals/systems sparsity. Recently, the idea of hidden sparsity has been proposed, and it has been shown that, in many cases, sparsity is not explicit, and some tools are required to expose hidden sparsity. In this paper, we propose the Feature Affine Projection (F-AP) algorithm to reveal hidden sparsity in unknown systems. Indeed, first, the hidden sparsity is revealed using the feature matrix, then it is exploited using some sparsity-promoting penalty function.


In this paper, we present an end-to-end deep convolutional neural network operating on multi-channel raw audio data to localize multiple simultaneously active acoustic sources in space. Previously reported end-to-end deep learning based approaches work well in localizing a single source directly from multi-channel raw-audio, but are not easily extendable to localize multiple sources due to the well known permutation problem.


Acoustic echoes retrieval is a research topic that is gaining importance in many speech and audio signal processing applications such as speech enhancement, source separation, dereverberation and room geometry estimation. This work proposes a novel approach to blindly retrieve the off-grid timing of early acoustic echoes from a stereophonic recording of an unknown sound source such as speech. It builds on the recent framework of continuous dictionaries.


We propose a hybrid method for reconstructing thermographic images by combining the recently developed virtual wave concept with deep neural networks. The method can be used to detect defects inside materials in a non-destructive way. We propose two architectures along with a thorough evaluation that shows a substantial improvement compared to state-of-the-art reconstruction procedures. The virtual waves are invariant of the thermal diffusivity property of the material.


In this paper, we propose a two-step training procedure for source separation via a deep neural network. In the first step we learn a transform (and it's inverse) to a latent space where masking-based separation performance using oracles is optimal. For the second step, we train a separation module that operates on the previously learned space. In order to do so, we also make use of a scale-invariant signal to distortion ratio (SI-SDR) loss function that works in the latent space, and we prove that it lower-bounds the SI-SDR in the time domain.


Deep learning approaches have recently achieved impressive performance on both audio source separation and sound classification. Most audio source separation approaches focus only on separating sources belonging to a restricted domain of source classes, such as speech and music. However, recent work has demonstrated the possibility of "universal sound separation", which aims to separate acoustic sources from an open domain, regardless of their class.


The paper proposes an efficient, robust, and reconfigurable technique to suppress various types of noises for any sampling rate. The theoretical analyses, subjective and objective test results show that the proposed noise suppression (NS) solution significantly enhances the speech transmission index (STI), speech intelligibility (SI), signal-to-noise ratio (SNR), and subjective listening experience. The STI and SI consists of 5 levels, i.e., bad, poor, fair, good, and excellent. The most common noisy condition is of SNR ranging from -5 to 8 dB.