Sorry, you need to enable JavaScript to visit this website.

ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2022 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

We study the problem of efficiently summarizing a short video into several keyframes, leveraging recent progress in fast graph sampling. Specifically, we first construct a similarity path graph (SPG) $\cG$, represented by graph Laplacian matrix $\L$, where the similarities between adjacent frames are encoded as positive edge weights. We show that maximizing the smallest eigenvalue $\lambda_{\min}(\B)$ of a coefficient matrix $\B = \text{diag}(\a) + \mu \L$, where $\a$ is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error.

Categories:
12 Views

The paper studies the motion control for mobile relays imple-
menting cooperative beamforming to aid the communication
between a source-destination pair. We consider an urban com-
munication scenario, where the channels exhibit spatiotempo-
ral correlations and thus can be learned. The relays move in a
time-slotted fashion within a three-dimensional cube. During
every slot, the relays beamform optimally to maximize the
Signal-to-Interference+Noise Ratio (SINR) at the destination

Categories:
19 Views

Existing speech-based coronavirus disease 2019 (COVID-19) detection systems provide poor interpretability and limited robustness to unseen data conditions. In this paper, we propose a system to overcome these limitations. In particular, we propose to fuse two different feature modalities with patient metadata in order to capture different properties of the disease. The first feature set is based on modulation spectral properties of speech. The second comprises spectral shape/descriptor features recently used for COVID-19 detection.

Categories:
6 Views

In this paper, we demonstrate how a generative model can be used to build a better recognizer through the control of content and style. We are building an online handwriting recognizer from a modest amount of training samples. By training our controllable handwriting synthesizer on the same data, we can synthesize handwriting with previously underrepresented content (e.g., URLs and email addresses) and style (e.g., cursive and slanted). Moreover, we propose a framework to analyze a recognizer that is trained with a mixture of real and synthetic training data.

Categories:
8 Views

This paper investigates methods to effectively retrieve speaker information from the personalized speaker adapted neural network acoustic models (AMs) in automatic speech recognition (ASR). This problem is especially important in the context of federated learning of ASR acoustic models where a global model is learnt on the server based on the updates received from multiple clients. We propose an approach to analyze information in neural network AMs based on a neural network footprint on the so-called Indicator dataset.

Categories:
19 Views

Pages