ICASSP 2021

ICASSP 2021 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2021 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit website.

Improved Intra Mode Coding Beyond AV1

Read more about Improved Intra Mode Coding Beyond AV1
Log in to post comments

icassp2021_poster_yizejin.pdf

icassp2021_poster_yizejin.pdf (287)

Categories:: Image/Video Coding

8 Views

Improved Intra Mode Coding Beyond AV1

Read more about Improved Intra Mode Coding Beyond AV1
Log in to post comments

2021ICASSP_slides.pdf

2021ICASSP_slides.pdf (393)

Categories:: Image/Video Coding

9 Views

REDAT: ACCENT-INVARIANT REPRESENTATION FOR END-TO-END ASR BY DOMAIN ADVERSARIAL TRAINING WITH RELABELING

Accents mismatching is a critical problem for end-to-end ASR. This paper aims to address this problem by building an accent-robust RNN-T system with domain adversarial training (DAT). We unveil the magic behind DAT and provide, for the first time, a theoretical guarantee that DAT learns accent-invariant representations. We also prove that performing the gradient reversal in DAT is equivalent to minimizing the Jensen-Shannon divergence between domain output distributions.

ICASSP2760poster.pdf

ICASSP2760poster.pdf (358)

ICASSP2760slides.pdf

ICASSP2760slides.pdf (378)

Categories:: Speech Processing

16 Views

Spatial Equalization Before Reception: Reconfigurable Intelligent Surfaces for Multi-Path Mitigation

Reconfigurable intelligent surfaces (RISs), which enable tunable anomalous reflection, have appeared as a promising method to enhance wireless systems. In this paper, we propose to use an RIS as a spatial equalizer to address the well-known multi-path fading phenomenon. By introducing some controllable paths artificially against the multi-path fading through the RIS, we can perform equalization during the transmission process instead of at the receiver, and thus all the users can share the same equalizer.

ICASSP21.pdf

ICASSP21.pdf (702)

Categories:: Communications and Networking

19 Views

Mixup Regularized Adversarial Networks for Multi-Domain Text Classification

Read more about Mixup Regularized Adversarial Networks for Multi-Domain Text Classification
Log in to post comments

Using the shared-private paradigm and adversarial training
can significantly improve the performance of multi-domain
text classification (MDTC) models. However, there are two
issues for the existing methods: First, instances from the multiple
domains are not sufficient for domain-invariant feature
extraction. Second, aligning on the marginal distributions
may lead to a fatal mismatch. In this paper, we propose mixup
regularized adversarial networks (MRANs) to address these
two issues. More specifically, the domain and category mixup

Poster.pdf

The poster (501)

Categories:: Learning theory and algorithms (MLR-LEAR)

17 Views

DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech Synthesis

Read more about DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech Synthesis
1 comment
Log in to post comments

4229-poster.pdf

Poster (275)

4229-slides.pdf

Slides (433)

Categories:: Speaker Recognition and Characterization (SPE-SPKR)

18 Views

Fast Inverse Mapping of Face GANs

Read more about Fast Inverse Mapping of Face GANs
Log in to post comments

Generative adversarial networks (GANs) synthesize realistic images from random latent vectors. While many studies have explored various training configurations and architectures for GANs, the problem of inverting the generator of GANs has been inadequately investigated. We train a ResNet architecture to map given faces to latent vectors that can be used to generate faces nearly identical to the target. We use a perceptual loss to embed face details in the recovered latent vector while maintaining visual quality using a pixel loss.

ICASSP_Slides.pdf

Presentation Slides (1193)

Categories:: Image/Video Processing
Neural network learning (MLR-NNLR)

10 Views

A TWO-STAGE APPROACH TO DEVICE-ROBUST ACOUSTIC SCENE CLASSIFICATION

Read more about A TWO-STAGE APPROACH TO DEVICE-ROBUST ACOUSTIC SCENE CLASSIFICATION
Log in to post comments

To improve device robustness, a highly desirable key feature of a competitive data-driven acoustic scene classification (ASC) system, a novel two-stage system based on fully convolutional neural networks (CNNs) is proposed. Our two-stage system leverages on an ad-hoc score combination based on two CNN classifiers: (i) the first CNN classifies acoustic inputs into one of three broad classes, and (ii) the second CNN classifies the same inputs into one of ten finergrained classes.