ICASSP 2022

ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2022 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Fast Graph Sampling for Short Video Summarization Using Gershgorin Disc Alignment

Read more about Fast Graph Sampling for Short Video Summarization Using Gershgorin Disc Alignment
Log in to post comments

We study the problem of efficiently summarizing a short video into several keyframes, leveraging recent progress in fast graph sampling. Specifically, we first construct a similarity path graph (SPG) $\cG$, represented by graph Laplacian matrix $\L$, where the similarities between adjacent frames are encoded as positive edge weights. We show that maximizing the smallest eigenvalue $\lambda_{\min}(\B)$ of a coefficient matrix $\B = \text{diag}(\a) + \mu \L$, where $\a$ is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error.

poster.pdf

Paper's poster (246)

Categories:: Image/Video Processing

14 Views

Deep Actor-Critic for Continuous 3D Motion Control in Mobile Relay Beamforming Networks

The paper studies the motion control for mobile relays imple-
menting cooperative beamforming to aid the communication
between a source-destination pair. We consider an urban com-
munication scenario, where the channels exhibit spatiotempo-
ral correlations and thus can be learned. The relays move in a
time-slotted fashion within a three-dimensional cube. During
every slot, the relays beamform optimally to maximize the
Signal-to-Interference+Noise Ratio (SINR) at the destination

ICASSP_PRESENTATION (9).pdf

ICASSP_PRESENTATION (9).pdf (377)

Categories:: Communication Systems and Applications

23 Views

HAVE BEST OF BOTH WORLDS: TWO-PASS HYBRID AND E2E CASCADING FRAMEWORK FOR SPEECH RECOGNITION

poster.pdf

ICASSP 2022 poster (237)

Categories:: Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

7 Views

FUSION OF MODULATION SPECTRAL AND SPECTRAL FEATURES WITH SYMPTOM METADATA FOR IMPROVED SPEECH-BASED COVID-19 DETECTION

Existing speech-based coronavirus disease 2019 (COVID-19) detection systems provide poor interpretability and limited robustness to unseen data conditions. In this paper, we propose a system to overcome these limitations. In particular, we propose to fuse two different feature modalities with patient metadata in order to capture different properties of the disease. The first feature set is based on modulation spectral properties of speech. The second comprises spectral shape/descriptor features recently used for COVID-19 detection.

slides.pptx

Slides (191)

Categories:: Other

8 Views

PremiUm-CNN: Propagating Uncertainty Towards Robust Convolutional Neural Networks

Read more about PremiUm-CNN: Propagating Uncertainty Towards Robust Convolutional Neural Networks
Log in to post comments

Dera-ICASSP2022.pdf

Presentation slides (292)

Dera-poster_final.pdf

Dera-poster_final.pdf (309)

Categories:: Bayesian learning; Bayesian signal processing (MLR-BAYL)

45 Views

UNIVERSAL PARALINGUISTIC SPEECH REPRESENTATIONS USING SELF-SUPERVISED CONFORMERS - ICASSP 2022 Poster

ICASSP 2022 poster.pdf

ICASSP 2022 poster (279)

Categories:: Speech Processing

9 Views

Universal Paralinguistic Speech Representations using Self-Supervised Conformers - ICASSP 2022 slides

ICASSP 2022.pdf

ICASSP 2022.pdf (310)

Categories:: Speech Processing

18 Views

DATA INCUBATION — SYNTHESIZING MISSING DATA FOR HANDWRITING RECOGNITION

Read more about DATA INCUBATION — SYNTHESIZING MISSING DATA FOR HANDWRITING RECOGNITION
Log in to post comments

In this paper, we demonstrate how a generative model can be used to build a better recognizer through the control of content and style. We are building an online handwriting recognizer from a modest amount of training samples. By training our controllable handwriting synthesizer on the same data, we can synthesize handwriting with previously underrepresented content (e.g., URLs and email addresses) and style (e.g., cursive and slanted). Moreover, we propose a framework to analyze a recognizer that is trained with a mixture of real and synthetic training data.

data_incubation_poster.pdf

data_incubation_poster.pdf (237)

Categories:: Other

10 Views

Privacy attacks for automatic speech recognition acoustic models in a federated learning framework

This paper investigates methods to effectively retrieve speaker information from the personalized speaker adapted neural network acoustic models (AMs) in automatic speech recognition (ASR). This problem is especially important in the context of federated learning of ASR acoustic models where a global model is learnt on the server based on the updates received from multiple clients. We propose an approach to analyze information in neural network AMs based on a neural network footprint on the so-called Indicator dataset.