Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2020 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Degradation due to additive noise is a significant road block in the real-life deployment of Speech Emotion Recognition (SER) systems. Most of the previous work in this field dealt with the noise degradation either at the signal or at the feature level. In this paper, to address the robustness aspect of the SER in additive noise scenarios, we propose multi-conditioning and data augmentation using an utterance level parametric generative noise model. The generative noise model is designed to generate noise types which can span the entire noise space in the mel-filterbank energy domain.


Bridge weigh-in-motion (BWIM) is a technique for detecting heavy vehicles that may cause serious damage to real bridges. BWIM is realized by analyzing the strain signals observed at places on the bridge in terms of bridge-component responses to the axle loads. In current practice, a BWIM system requires multiple strain sensors to collect vehicle properties including speed and axle positions for accurate load estimation, which may limit the system’s life-span.


Automatic analysis of highly crowded people has attracted extensive attention from computer vision research. Previous approaches for crowd counting have already achieved promising performance across various benchmarks. However, to deal with the real situation, we hope the model run as fast as possible while keeping accuracy. In this paper, we propose a compact convolutional neural network for crowd counting which learns a more efficient model with a small number of parameters.


Object monitoring can be performed by change detection algorithms. However, for the image pair with a large perspective difference, the change detection performance is usually impacted by inaccurate image registration. To address the above difficulties, a novel object-specific change detection approach is proposed for object monitoring in this paper. In contrast to traditional approaches, the proposed approach is robust to view angle variation and does not require explicit image registration. Experiments demonstrate the effectiveness and advantages of the proposed approach.


The recent success of the Transformer based sequence-to-sequence framework for various Natural Language Processing tasks has motivated its application to Automatic Speech Recognition. In this work, we explore the application of Transformers on low resource Indian languages in a multilingual framework. We explore various methods to incorporate language information into a multilingual Transformer, i.e.,(i) at the decoder, (ii) at the encoder. These methods include using language identity tokens or providing language information to the acoustic vectors.