Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The 2019 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website

We consider the problem of detecting whether a tensor signal having many missing entities lies within a given low dimensional Kronecker-Structured (KS) subspace. This is a matched subspace detection problem. Tensor matched subspace detection problem is more challenging because of the intertwined signal dimensions. We solve this problem by projecting the signal onto the KS subspace, which is a Kronecker product of different subspaces corresponding to each signal dimension. Under this framework, we define the KS subspaces and the orthogonal projection of the signal onto the KS subspace.


Deep learning models have significantly improved the visual quality and accuracy on compressive sensing recovery. In this paper, we propose an algorithm for signal reconstruction from compressed measurements with image priors captured by a generative model. We search and constrain on latent variable space to make the method stable when the number of compressed measurements is extremely limited. We show that, by exploiting certain structures of the latent variables, the proposed method produces improved reconstruction accuracy and preserves realistic and non-smooth features in the image.


Fluent and confident speech is desirable to every speaker. But professional speech delivering requires a great deal of experience and practice. In this paper, we propose a speech stream manipulation system which can help non-professional speakers to produce fluent, professional-like speech content, in turn contributing towards better listener engagement and comprehension. We propose to achieve this task by manipulating the disfluencies in human speech, like the sounds uh and um, the filler words and awkward long silences.


Speaker diarisation systems often cluster audio segments using speaker embeddings such as i-vectors and d-vectors. Since different types of embeddings are often complementary, this paper proposes a generic framework to improve performance by combining them into a single embedding, referred to as a c-vector. This combination uses a 2-dimensional (2D) self-attentive structure, which extends the standard self-attentive layer by averaging not only across time but also across different types of embeddings.


In this paper, we address the problem of bird audio detec-
tion and propose a new convolutional neural network archi-
tecture together with a divergence based information channel
weighing strategy in order to achieve improved state-of-the-
art performance and faster convergence. The effectiveness of
the methodology is shown on the Bird Audio Detection Chal-
lenge 2018 (Detection and Classification of Acoustic Scenes
and Events Challenge, Task 3) development data set.


We introduce a method to improve the quality of simple scalar quantization in the context of acoustic sensor networks by combining ideas from sparse reconstruction, artificial neural networks and weighting filters. We start from the observation that optimization methods based on sparse reconstruction resemble the structure of a neural network. Hence, building upon a successful enhancement method, we unroll the algorithms and use this to build a neural network which we train to obtain enhanced decoding.


Existing blind evaluators for screen content images (SCIs) are mainly learning-based and require a number of training images with co-registered human opinion scores. However, the size of existing databases is small, and it is labor-, timeconsuming and expensive to largely generate human opinion scores. In this study, we propose a novel blind quality evaluator without training.