ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2022 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.
- Read more about Fast Graph Sampling for Short Video Summarization Using Gershgorin Disc Alignment
- Log in to post comments
We study the problem of efficiently summarizing a short video into several keyframes, leveraging recent progress in fast graph sampling. Specifically, we first construct a similarity path graph (SPG) $\cG$, represented by graph Laplacian matrix $\L$, where the similarities between adjacent frames are encoded as positive edge weights. We show that maximizing the smallest eigenvalue $\lambda_{\min}(\B)$ of a coefficient matrix $\B = \text{diag}(\a) + \mu \L$, where $\a$ is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error.
poster.pdf
- Categories:
- Read more about Deep Actor-Critic for Continuous 3D Motion Control in Mobile Relay Beamforming Networks
- Log in to post comments
The paper studies the motion control for mobile relays imple-
menting cooperative beamforming to aid the communication
between a source-destination pair. We consider an urban com-
munication scenario, where the channels exhibit spatiotempo-
ral correlations and thus can be learned. The relays move in a
time-slotted fashion within a three-dimensional cube. During
every slot, the relays beamform optimally to maximize the
Signal-to-Interference+Noise Ratio (SINR) at the destination
- Categories:
- Read more about HAVE BEST OF BOTH WORLDS: TWO-PASS HYBRID AND E2E CASCADING FRAMEWORK FOR SPEECH RECOGNITION
- Log in to post comments
poster.pdf
- Categories:
- Read more about FUSION OF MODULATION SPECTRAL AND SPECTRAL FEATURES WITH SYMPTOM METADATA FOR IMPROVED SPEECH-BASED COVID-19 DETECTION
- Log in to post comments
Existing speech-based coronavirus disease 2019 (COVID-19) detection systems provide poor interpretability and limited robustness to unseen data conditions. In this paper, we propose a system to overcome these limitations. In particular, we propose to fuse two different feature modalities with patient metadata in order to capture different properties of the disease. The first feature set is based on modulation spectral properties of speech. The second comprises spectral shape/descriptor features recently used for COVID-19 detection.
slides.pptx
- Categories:
- Read more about PremiUm-CNN: Propagating Uncertainty Towards Robust Convolutional Neural Networks
- Log in to post comments
- Categories:
- Read more about UNIVERSAL PARALINGUISTIC SPEECH REPRESENTATIONS USING SELF-SUPERVISED CONFORMERS - ICASSP 2022 Poster
- Log in to post comments
- Categories:
- Read more about Universal Paralinguistic Speech Representations using Self-Supervised Conformers - ICASSP 2022 slides
- Log in to post comments
- Categories:
- Read more about DATA INCUBATION — SYNTHESIZING MISSING DATA FOR HANDWRITING RECOGNITION
- Log in to post comments
In this paper, we demonstrate how a generative model can be used to build a better recognizer through the control of content and style. We are building an online handwriting recognizer from a modest amount of training samples. By training our controllable handwriting synthesizer on the same data, we can synthesize handwriting with previously underrepresented content (e.g., URLs and email addresses) and style (e.g., cursive and slanted). Moreover, we propose a framework to analyze a recognizer that is trained with a mixture of real and synthetic training data.
- Categories:
- Read more about Privacy attacks for automatic speech recognition acoustic models in a federated learning framework
- Log in to post comments
This paper investigates methods to effectively retrieve speaker information from the personalized speaker adapted neural network acoustic models (AMs) in automatic speech recognition (ASR). This problem is especially important in the context of federated learning of ASR acoustic models where a global model is learnt on the server based on the updates received from multiple clients. We propose an approach to analyze information in neural network AMs based on a neural network footprint on the so-called Indicator dataset.
- Categories:
- Read more about LEARNING ADJUSTABLE IMAGE RESCALING WITH JOINT OPTIMIZATION OF PERCEPTION AND DISTORTION
- Log in to post comments
- Categories: