Sorry, you need to enable JavaScript to visit this website.

ICASSP is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2020 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Involvement hot spots have been proposed as a useful concept for meeting analysis and studied off and on for over 15 years. These are regions of meetings that are marked by high participant involvement, as judged by human annotators. However, prior work was either not conducted in a formal machine learning setting, or focused on only a subset of possible meeting features or downstream applications (such as summarization). In this paper we investigate to what extent various acoustic, linguistic and pragmatic aspects of the meetings, both in isolation and jointly, can help detect hot spots.


In this paper, we address the Online Unsupervised Domain Adaptation (OUDA) problem, where the target data are unlabelled and arriving sequentially. The traditional methods on the OUDA problem mainly focus on transforming each arriving target data to the source domain, and they do not sufficiently consider the temporal coherency and accumulative statistics among the arriving target data. We propose a multi-step framework for the OUDA problem, which institutes a novel method to compute the mean-target subspace inspired by the geometrical interpretation on the Euclidean space.


In low light condition, color (RGB) images captured by camera contain much noise and loss of details and color. However, near infrared (NIR) images are robust to noise and have clear textures without color. In this paper, we propose multi-spectral fusion of RGB and NIR images using weighted least squares (WLS) and alternating guidance. Low light RGB images provide coarse image structure and color, while NIR images offer clear textures in a short distance. Since they are complementary, we adopt alternating guidance for fusion of RGB and NIR images based on WLS.


Altitude estimation is important for successful control and navigation of unmanned aerial vehicles (UAVs). UAVs do not have indoor access to GPS signals and can only use on-board sensors for reliable estimation of altitude. Unfortunately, most existing navigation schemes are not robust to the presence of abnormal obstructions above and below the UAV.


In this paper, we focus on learning the underlying product graph structure from multidomain training data. We assume that the product graph is formed from a Cartesian graph product of two smaller factor graphs. We then pose the product graph learning problem as the factor graph Laplacian matrix estimation problem. To estimate the factor graph Laplacian matrices, we assume that the data is smooth with respect to the underlying product graph.


Despite the ability to produce human-level speech for in-domain text, attention-based end-to-end text-to-speech (TTS) systems suffer from text alignment failures that increase in frequency for out-of-domain text. We show that these failures can be addressed using simple location-relative attention mechanisms that do away with content-based query/key comparisons. We compare two families of attention mechanisms: location-relative GMM-based mechanisms and additive energy-based mechanisms.