Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

This paper presents a novel problem of detection and localization of anomalous events due to a certain class of objects in video data with applications to smart surveillance. A baseline system is proposed that uses a convolutional neural network (CNN) to generate pixel level masks corresponding to objects of a class of interest. A Restricted Boltzmann Machine (RBM) is then trained on the mask to learn patterns of normal behavior. The free energy of the RBM is used to detect the presence of an anomaly while the reconstruction error is used to localize the anomaly.


Spike sorting is the process of assigning each detected neuronal spike in an extracellular recording to its putative source neuron. A linear filter design is proposed where the filter output allows for threshold-based spike sorting of high-density neural probe data. The proposed filter design is based on optimizing the signal-to-peak-interference ratio for each detectable neuron in a data-driven way.


This paper addresses the issues of the denoising and retrieval of the components of multicomponent signals from their short-time Fourier transform (STFT). After having recalled the hard-thresholding technique, in the STFT context, we develop a new thresholding technique by exploiting some limitations of the former. Numerical experiments illustrating the benefits of the proposed method to retrieve the modes of noisy multicomponent signals conclude the paper.


Conventional seq2seq chatbot models only try to find the sentences with the highest probabilities conditioned on the input sequences, without considering the sentiment of the output sentences. Some research works trying to modify the sentiment of the output sequences were reported. In this paper, we propose five models to scale or adjust the sentiment of the chatbot response: persona-based model, reinforcement learning, plug and play model, sentiment transformation network and cycleGAN, all based on the conventional seq2seq model.


Multi-channel speech enhancement with ad-hoc sensors has been a challenging task. Speech model guided beamforming algorithms are able to recover natural sounding speech, but the speech models tend to be oversimplified or the inference would otherwise be too complicated. On the other hand, deep learning based enhancement approaches are able to learn complicated speech distributions and perform efficient inference, but they are unable to deal with variable number of input channels.


Time delay neural networks (TDNNs) are an effective acoustic model for large vocabulary speech recognition. The strength of the model can be attributed to its ability to effectively model long temporal contexts. However, current TDNN models are relatively shallow, which limits the modelling capability. This paper proposes a method of increasing the network depth by deepening the kernel used in the TDNN temporal convolutions. The best performing kernel consists of three fully connected layers with a residual (ResNet) connection from the output of the first to the output of the third.


In this paper, we aim at the problem of tensor data completion. Tensor-train decomposition is adopted because of its powerful representation ability and linear scalability to tensor order. We propose an algorithm named Sparse Tensor-train Optimization (STTO) which considers incomplete data as sparse tensor and uses first-order optimization method to find the factors of tensor-train decomposition. Our algorithm is shown to perform well in simulation experiments at both low-order cases and high-order cases.


EEG-based authentication is an emerging research field. In this work, a realistic authentication system using Electroencephalography signals, was developed aiming to show that brain signals contain sufficient information to be used in security systems. The dataset used was composed of 29 users on 4 different days via the cheap Neurosky Mindwave headset with a single dry electrode, and 10 users on 3 different days via Emotiv with 14 electrodes. Various techniques, features, and algorithms were examined to achieve the highest security.

