ICASSP 2022

ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2022 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

HYBRID ATTENTION-BASED PROTOTYPICAL NETWORKS FOR FEW-SHOT SOUND CLASSIFICATION

Read more about HYBRID ATTENTION-BASED PROTOTYPICAL NETWORKS FOR FEW-SHOT SOUND CLASSIFICATION
Log in to post comments

In recent years, prototypical networks have been widely used
in many few-shot learning scenarios. However, as a metric-
based learning method, their performance often degrades in
the presence of bad or noisy embedded features, and outliers
in support instances. In this paper, we introduce a hybrid at-
tention module and combine it with prototypical networks for
few-shot sound classification. This hybrid attention module
consists of two blocks: a feature-level attention block, and

My poster ICASSP 2022.pdf

My poster ICASSP 2022.pdf (394)

Categories:: Applications in Music and Audio Processing (MLR-MUSI)
Audio and Acoustic Signal Processing

66 Views

Adversarially-Trained Nonnegative Matrix Factorization

Read more about Adversarially-Trained Nonnegative Matrix Factorization
1 comment
Log in to post comments

ICASSP_AT_NMF_poster.pdf

ICASSP_AT_NMF_poster.pdf (355)

Categories:: Signal and System Modeling, Representation and Estimation

33 Views

SPE-44.3: A MODEL FOR ASSESSOR BIAS IN AUTOMATIC PRONUNCIATION ASSESSMENT (POSTER)

Read more about SPE-44.3: A MODEL FOR ASSESSOR BIAS IN AUTOMATIC PRONUNCIATION ASSESSMENT (POSTER)
Log in to post comments

icassp22_poster_14_05_22.pdf

icassp22_poster_14_05_22.pdf (263)

Categories:: Other

21 Views

SPE-44.3: A MODEL FOR ASSESSOR BIAS IN AUTOMATIC PRONUNCIATION ASSESSMENT (SLIDES)

Read more about SPE-44.3: A MODEL FOR ASSESSOR BIAS IN AUTOMATIC PRONUNCIATION ASSESSMENT (SLIDES)
Log in to post comments

icassp22_slides_14_05_22.pdf

Slides for presentation at ICASSP 2022 (286)

Categories:: Other

18 Views

Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge

This paper describes our submission to the L3DAS22 Challenge Task 1, which consists of speech enhancement with 3D Ambisonic microphones. The core of our approach combines Deep Neural Network (DNN) driven complex spectral mapping with linear beamformers such as the multi-frame multi-channel Wiener filter. Our proposed system has two DNNs and a linear beamformer in between. Both DNNs are trained to perform complex spectral mapping, using a combination of waveform and magnitude spectrum losses.

iNeuBe_ Towards Low-distortion Multi-channel Speech Enhancement.pdf

Presentation Slides (257)

Categories:: Speech Enhancement (SPE-ENHA)

32 Views

Conditional Diffusion Probabilistic Model for Speech Enhancement

Read more about Conditional Diffusion Probabilistic Model for Speech Enhancement
Log in to post comments

Speech enhancement is a critical component of many user-oriented audio applications, yet current systems still suffer from distorted and unnatural outputs. While generative models have shown strong potential in speech synthesis, they are still lagging behind in speech enhancement. This work leverages recent advances in diffusion probabilistic models, and proposes a novel speech enhancement algorithm that incorporates characteristics of the observed noisy speech signal into the diffusion and reverse processes.