Sorry, you need to enable JavaScript to visit this website.

ICASSP 2021 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2021 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Accents mismatching is a critical problem for end-to-end ASR. This paper aims to address this problem by building an accent-robust RNN-T system with domain adversarial training (DAT). We unveil the magic behind DAT and provide, for the first time, a theoretical guarantee that DAT learns accent-invariant representations. We also prove that performing the gradient reversal in DAT is equivalent to minimizing the Jensen-Shannon divergence between domain output distributions.


Reconfigurable intelligent surfaces (RISs), which enable tunable anomalous reflection, have appeared as a promising method to enhance wireless systems. In this paper, we propose to use an RIS as a spatial equalizer to address the well-known multi-path fading phenomenon. By introducing some controllable paths artificially against the multi-path fading through the RIS, we can perform equalization during the transmission process instead of at the receiver, and thus all the users can share the same equalizer.


Using the shared-private paradigm and adversarial training
can significantly improve the performance of multi-domain
text classification (MDTC) models. However, there are two
issues for the existing methods: First, instances from the multiple
domains are not sufficient for domain-invariant feature
extraction. Second, aligning on the marginal distributions
may lead to a fatal mismatch. In this paper, we propose mixup
regularized adversarial networks (MRANs) to address these
two issues. More specifically, the domain and category mixup


Generative adversarial networks (GANs) synthesize realistic images from random latent vectors. While many studies have explored various training configurations and architectures for GANs, the problem of inverting the generator of GANs has been inadequately investigated. We train a ResNet architecture to map given faces to latent vectors that can be used to generate faces nearly identical to the target. We use a perceptual loss to embed face details in the recovered latent vector while maintaining visual quality using a pixel loss.


To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed. Our two-stage system leverages on an ad-hoc score combination based on two CNN classifiers: (i) the first CNN classifies acoustic inputs into one of three broad classes, and (ii) the second CNN classifies the same inputs into one of ten finergrained classes.


This paper investigates the employment of photoplethysmography (PPG) for user authentication systems. Time-stable and user-specific features are developed by stretching the signal, designing a convolutional neural network and performing a variation-stable approach with three score fusions. Two evaluation scenarios are explored, namely single-session and two-sessions.