ICASSP 2022

ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2022 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

SELECTIVE MUTUAL LEARNING: AN EFFICIENT APPROACH FOR SINGLE CHANNEL SPEECH SEPARATION

Read more about SELECTIVE MUTUAL LEARNING: AN EFFICIENT APPROACH FOR SINGLE CHANNEL SPEECH SEPARATION
Log in to post comments

Mutual learning, the related idea to knowledge distillation, is a group of untrained lightweight networks, which simultaneously learn and share knowledge to perform tasks together during training. In this paper, we propose a novel mutual learning approach, namely selective mutual learning. This is the simple yet effective approach to boost the performance of the networks for speech separation. There are two networks in the selective mutual learning method, they are like a pair of friends learning and sharing knowledge with each other.

Selective_Mutual_Learning_An_Efficient_Approach_for_Single_Channel_Speech_Separation.pdf

Selective_Mutual_Learning_An_Efficient_Approach_for_Single_Channel_Speech_Separation.pdf (477)

PresentICASSP2022.pptx

PresentICASSP2022.pptx (506)

Categories:: Source Separation and Signal Enhancement

185 Views

IMPQ: Reduced Complexity Neural Networks via Granular Precision Assignment

Read more about IMPQ: Reduced Complexity Neural Networks via Granular Precision Assignment
Log in to post comments

ICASSP22_final.pdf

ICASSP22_final.pdf (473)

Categories:: Learning theory and algorithms (MLR-LEAR)

69 Views

saito22icassp_slide

Read more about saito22icassp_slide
Log in to post comments

We propose novel deep speaker representation learning that considers perceptual similarity among speakers for multi-speaker generative modeling. Following its success in accurate discriminative modeling of speaker individuality, knowledge of deep speaker representation learning (i.e., speaker representation learning using deep neural networks) has been introduced to multi-speaker generative modeling.

saito22icassp_presen12min_arial.pdf

saito22icassp_presen12min_arial.pdf (486)

Categories:: Speech Synthesis and Generation, including TTS (SPE-SYNT)

60 Views

Deep Temporal Interpolation of Radar-based Precipitation

Read more about Deep Temporal Interpolation of Radar-based Precipitation
Log in to post comments

When providing the boundary conditions for hydrological flood models and estimating the associated risk, interpolating precipitation at very high temporal resolutions (e.g. 5 minutes) is essential not to miss the cause of flooding in local regions. In this paper, we study optical flow-based interpolation of globally available weather radar images from satellites.

ICASSP-22 precipitation 0524-1127.pptx

Slides (414)

Categories:: Image/Video Processing

57 Views

ICASSP 2022 SPE-L1.5_EXPLOITING ANNOTATORS’ TYPED DESCRIPTION OF EMOTION PERCEPTION TO MAXIMIZE UTILIZATION OF RATINGS FOR SPEECH EMOTION RECOGNITION

SPE-L1.5_EXPLOITING ANNOTATORS’ TYPED DESCRIPTION OF EMOTION PERCEPTION TO MAXIMIZE UTILIZATION OF RATINGS FOR SPEECH EMOTION RECOGNITION.pdf

SPE-L1.5_EXPLOITING ANNOTATORS’ TYPED DESCRIPTION OF EMOTION PERCEPTION TO MAXIMIZE UTILIZATION OF RATINGS FOR SPEECH EMOTION RECOGNITION.pdf (596)

Categories:: Audio and Acoustic Signal Processing

94 Views

On Language Model Integration for RNN Transducer based Speech Recognition

Read more about On Language Model Integration for RNN Transducer based Speech Recognition
Log in to post comments

The mismatch between an external language model (LM) and the implicitly learned internal LM (ILM) of RNN-Transducer (RNN-T) can limit the performance of LM integration such as simple shallow fusion. A Bayesian interpretation suggests to remove this sequence prior as ILM correction. In this work, we study various ILM correction-based LM integration methods formulated in a common RNN-T framework. We provide a decoding interpretation on two major reasons for performance improvement with ILM correction, which is further experimentally verified with detailed analysis.

slides_Zhou_transducer_ILM.pdf

presentation slides (552)

Categories:: Speech Processing

99 Views

ENABLING ON-DEVICE TRAINING OF SPEECH RECOGNITION MODELS WITH FEDERATED DROPOUT

Read more about ENABLING ON-DEVICE TRAINING OF SPEECH RECOGNITION MODELS WITH FEDERATED DROPOUT
Log in to post comments

Federated learning can be used to train machine learning models on the edge on local data that never leave devices, providing privacy by default. This presents a challenge pertaining to the communication and computation costs associated with clients’ devices. These costs are strongly correlated with the size of the model being trained, and are significant for state-of-the-art automatic speech recognition models.We propose using federated dropout to reduce the size of client models while training a full-size model server-side.

[Poster] Enabling On-Device Training of Speech Recognition Models with Federated Dropout (1).pdf

[Poster] Enabling On-Device Training of Speech Recognition Models with Federated Dropout (1).pdf (808)

Categories:: General Topics in Speech Recognition (SPE-GASR)
Acoustic Modeling for Automatic Speech Recognition (SPE-RECO)

46 Views

ENABLING ON-DEVICE TRAINING OF SPEECH RECOGNITION MODELS WITH FEDERATED DROPOUT

Read more about ENABLING ON-DEVICE TRAINING OF SPEECH RECOGNITION MODELS WITH FEDERATED DROPOUT
Log in to post comments

[Presentation] ICASSP 2022 Federated Dropout.pdf

[Presentation] ICASSP 2022 Federated Dropout.pdf (401)

Categories:: General Topics in Speech Recognition (SPE-GASR)

34 Views

ROBUST NONPARAMETRIC DISTRIBUTION FORECAST WITH BACKTEST-BASED BOOTSTRAP AND ADAPTIVE RESIDUAL SELECTION

Distribution forecast can quantify forecast uncertainty and provide various forecast scenarios with their corresponding estimated probabilities. Accurate distribution forecast is crucial for planning - for example when making production capacity or inventory allocation decisions. We propose a practical and robust distribution forecast framework that relies on backtest-based bootstrap and adaptive residual selection.

slides_distribution_forecast_icassp_2022.pdf

slides_distribution_forecast_icassp_2022.pdf (408)

Categories:: Machine Learning for Signal Processing

31 Views

ICASSP 2022 - Improved Language Identification Through Cross-Lingual Self-Supervised Learning

Language identification greatly impacts the success of downstream tasks such as automatic speech recognition. Recently, self-supervised speech representations learned by wav2vec 2.0 have been shown to be very effective for a range of speech tasks. We extend previous self-supervised work on language identification by experimenting with pre-trained models which were learned on real-world unconstrained speech in multiple languages and not just on English.

Slide_ICASSP_LIDW2V.pdf

Slide (618)

Categories:: Multilingual Recognition and Identification (SPE-MULT)

291 Views

Pages