ICASSP 2022

ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2022 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

DCNGAN：A Deformable Convolution-Based GAN with QP Adaptation for Perceptual Quality Enhancement of Compressed Video

In this paper, we propose a deformable convolution-based generative adversarial network (DCNGAN) for perceptual quality enhancement of compressed videos. DCNGAN is also adaptive to the quantization parameters (QPs). Compared with optical flows, deformable convolutions are more effective and efficient to align frames. Deformable convolutions can operate on multiple frames, thus leveraging more temporal information, which is beneficial for enhancing the perceptual quality of compressed videos.

Zhang.pdf

Poster (190)

Categories:: Image/Video Processing

18 Views

DPCCN: Densely-Connected Pyramid Complex Convolutional Network for Robust Speech Separation And Extraction

In recent years, a number of time-domain speech separation methods have been proposed. However, most of them are very sensitive to the environments and wide domain coverage tasks. In this paper, from the time-frequency domain perspective, we propose a densely-connected pyramid complex convolutional network, termed DPCCN, to improve the robustness of speech separation under complicated conditions. Furthermore, we generalize the DPCCN to target speech extraction (TSE) by integrating a new specially designed speaker encoder.

icassp22_poster.pdf

icassp22_poster.pdf (324)

Categories:: Source Separation and Signal Enhancement

19 Views

JMPNET: JOINT MOTION PREDICTION FOR LEARNING-BASED VIDEO COMPRESSION

Read more about JMPNET: JOINT MOTION PREDICTION FOR LEARNING-BASED VIDEO COMPRESSION
1 comment
Log in to post comments

ICASSP2022_Poster.pdf

ICASSP2022_Poster.pdf (449)

Categories:: Image/Video Coding

16 Views

Transient Analysis of Clustered Multitask Diffusion RLS Algorithm

Read more about Transient Analysis of Clustered Multitask Diffusion RLS Algorithm
Log in to post comments

3334.pdf

3334.pdf (230)

Categories:: Communication and Sensing aspects of Sensor Networks, Wireless and Ad-Hoc Networks

10 Views

ROBUST DISENTANGLED VARIATIONAL SPEECH REPRESENTATION LEARNING FOR ZERO-SHOT VOICE CONVERSION

Traditional studies on voice conversion (VC) have made progress with parallel training data and known speakers. Good voice conversion quality is obtained by exploring better alignment modules or expressive mapping functions. In this study, we investigate zero-shot VC from a novel perspective of self-supervised disentangled speech representation learning. Specifically, we achieve the disentanglement by balancing the information flow between global speaker representation and time-varying content representation in a sequential variational autoencoder (VAE).

VC-2022icassp.pdf

VC-2022icassp.pdf (352)

Categories:: Speech Synthesis and Generation, including TTS (SPE-SYNT)

29 Views

Causal Alignment Based Fault Root Causes Localization for Wireless Network

Read more about Causal Alignment Based Fault Root Causes Localization for Wireless Network
Log in to post comments

presentation.pdf

presentation.pdf (287)

Categories:: Other

32 Views

MULTIPLE INSTANCE LEARNING WITH TASK-SPECIFIC MULTI-LEVEL FEATURES FOR WEAKLY ANNOTATED HISTOPATHOLOGICAL IMAGE CLASSIFICATION

Poster_A0_Landscape.pdf

Poster_A0_Landscape.pdf (429)

Categories:: Bioimaging and microscopy

12 Views

ChunkFusion: A Learning-based RGB-D 3D Reconstruction Framework via Chunk-wise Integration

Recent years have witnessed a growing interest in online RGB-D 3D reconstruction. On the premise of ensuring the reconstruction accuracy with noisy depth scans, making the system scalable to various environments is still challenging. In this paper, we devote our efforts to try to fill in this research gap by proposing a scalable and robust RGB-D 3D reconstruction framework, namely ChunkFusion. In ChunkFusion, sparse voxel management is exploited to improve the scalability of online reconstruction.

chunkfusion_presentation.pdf

Presentation Slides (269)

chunkfusion_poster.pdf

Poster (209)

Categories:: Other applications of machine learning (MLR-APPL)

19 Views

REGULARIZED LATENT SPACE EXPLORATION FOR DISCRIMINATIVE FACE SUPER-RESOLUTION

Read more about REGULARIZED LATENT SPACE EXPLORATION FOR DISCRIMINATIVE FACE SUPER-RESOLUTION
Log in to post comments

shi-poster.pdf

shi-poster.pdf (198)

Categories:: Image/Video Processing

10 Views

FORENSIC ANALYSIS AND LOCALIZATION OF MULTIPLY COMPRESSED MP3 AUDIO USING TRANSFORMERS

Audio signals are often stored and transmitted in compressed formats. Among the many available audio compression schemes, MPEG-1 Audio Layer III (MP3) is very popular and widely used. Since MP3 is lossy it leaves characteristic traces in the compressed audio which can be used forensically to expose the past history of an audio file. In this paper, we consider the scenario of audio signal manipulation done by temporal splicing of compressed and uncompressed audio signals. We propose a method to find the temporal location of the splices based on transformer networks.

icassp_2022_poster.pdf

POSTER (454)

Categories:: Multimedia Forensics

16 Views

Pages