Sorry, you need to enable JavaScript to visit this website.

ICASSP 2021 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2021 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Due to the wide use of multi-sensor technology, analysis of multiple sets of data is at the heart of many challenging engineering problems. Independent vector analysis (IVA), a recent generalization of independent component analysis (ICA), enables the joint analysis of datasets and extraction of latent sources through the use of a simple yet effective generative model. However, the success of IVA is tied to proper estimation of the probability density function (PDF) of the multivariate latent sources; information that is generally unknown.

Categories:
69 Views

Prosody is an integral part of communication, but remains an open problem in state-of-the-art speech synthesis. There are two major issues faced when modelling prosody: (1) prosody varies at a slower rate compared with other content in the acoustic signal (e.g. segmental information and background noise); (2) determining appropriate prosody without sufficient context is an ill-posed problem. In this paper, we propose solutions to both these issues. To mitigate the challenge of modelling a slow-varying signal, we learn to disentangle prosodic information using a word level representation.

Categories:
7 Views

We propose a novel pitch estimation technique called DeepF0, which leverages the available annotated data to directly learns from the raw audio in a data-driven manner. F0 estimation is important in various speech processing and music information retrieval applications. Existing deep learning models for pitch estimations have relatively limited learning capabilities due to their shallow receptive field. The proposed model addresses this issue by extending the receptive field of a network by introducing the dilated convolutional blocks into the network.

Categories:
108 Views

Aerial image classification is challenging for current deep learning models due to the varied geo-spatial object scales and the complicated scene spatial arrangement. Thus, it is necessary to stress the key local feature response from a variety of scales so as to represent discriminative convolutional features. In this paper, we propose a deep multi-scale multiple instance learning (DMSMIL) framework to tackle the above challenges. Firstly, we develop a differential multi-scale dilated convolution feature extractor to exploit the different patterns from different scales.

Categories:
12 Views

We present a novel video stabilization algorithm (LSstab) that removes unwanted motions in real-time. LSstab is based on a novel least squares formulation of the smoothing cost function to alleviate the undesirable camera jitter. A recursive least square solver is derived to minimize the smoothing cost function with an O(N) computation complexity. LSstab is evaluated using a suite of publicly available videos against the state of the art video stabilization methods. Results show LSstab reaches comparable or better performance, achieving real-time processing speed when a GPU is used.

Categories:
15 Views

Pages