Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2020 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

This paper presents an improved deep embedding learning method based on a convolutional neural network (CNN) for text-independent speaker verification. Two improvements are proposed for x-vector embedding learning: (1) a multiscale convolution (MSCNN) is adopted in the frame-level layers to capture the complementary speaker information in different receptive fields; (2) a Baum-Welch statistics attention (BWSA) mechanism is applied in the pooling layer, which can integrate more useful long-term speaker characteristics in the temporal pooling layer.


We consider source coding of audio signals with the help of a generative model. We use a construction where a waveform is first quantized, yielding a finite bitrate representation. The waveform is then reconstructed by random sampling from a model conditioned on the quantized waveform. The proposed coding scheme is theoretically analyzed. Using SampleRNN as the generative model, we demonstrate that the proposed coding structure provides performance competitive with state-of-the-art source coding tools for specific categories of audio signals.


Spectrograms fusion is an effective method for incorporating complementary speech dereverberation systems. Previous linear spectrograms fusion by averaging multiple spectrograms shows outstanding performance. However, various systems with different features cannot apply this simple method. In this study, we design the minimum difference masks (MDMs) to classify the time-frequency (T-F) bins in spectrograms according to the nearest distances from labels. Then, we propose a two-stage nonlinear spectrograms fusion system for speech dereverberation.


In this paper, we extend previous particle filtering methods whose states were constrained to the (real) Stiefel manifold to the complex case. The method is then applied to a Bayesian formulation of the subspace tracking problem. To implement the proposed particle filter, we modify a previous MCMC algorithm so as to simulate from densities defined on the complex manifold. Also, to compute subspace estimates from particle approximations, we extend existing averaging methods to complex Grassmannians.


Neural network language model (NNLM) is an essential component of industrial ASR systems. One important challenge of training an NNLM is to leverage between scaling the learning process and handling big data. Conventional approaches such as block momentum provides a blockwise model update filtering (BMUF) process and achieves almost linear speedups with no performance degradation for speech recognition.


With the increasing of human space activities, the number of space debris has increased dramatically, the possibility that spacecraft in orbit is impacted by space debris is growing. It is important to detect and locate the gas leak accurately and timely. In this paper, a leak detection method using ultrasonic sensor array is proposed. Firstly, the ultrasonic sensor array is used to detect the leak acoustic signal which propagates as Lamb wave through spacecraft structure. Then we apply beam forming algorithm to determine the direction of the leak source.


In the treatment of epilepsy with intracranial electroencephalogram(iEEG), the recognition accuracy is low, and it is
difficult to find the correlation between channels because of the large amount of channel numbers and time series data. In
order to solve these problems, we propose a novel EEG feature prepresentation method for seizure detection based on the
Log Mel-Filterbank energy feature. We propose to adapt the Mel-Filterbank energy to EEG features with logrithm transform


In automotive radar imaging, displaced sensors offer improvement in localization accuracy by jointly processing the data acquired from multiple radar units, each of which may have limited individual resources. In this paper, we derive performance bounds on the estimation error of target parameters processed by displaced sensors that correspond to several independent radars mounted at different locations on the same vehicle. Unlike previous studies, we do not assume a very accurate time synchronization among the sensors.


This work outlines a method for an application of empirical Bayes in the setting of semi-supervised learning. That is, we consider a scenario in which the training set is partially or entirely unlabeled. In addition to the missing labels, we also consider a scenario where the available training data might be shuffled (i.e., the features and labels are not matched).


Attention networks constitute the state-of-the-art paradigm for capturing long temporal dynamics. This paper examines the efficacy of this paradigm in the challenging task of emotion recognition in dyadic conversations. In this work, we introduce a novel attention mechanism capable of inferring the immensity of the effect of each past utterance on the current speaker emotional state.