ICASSP 2022

ICASSP 2022 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2022 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Multi-frame Full-rank Spatial Covariance Analysis for Underdetermined BSS in Reverberant Environment

Full-rank spatial covariance analysis (FCA) is a blind source separation (BSS) method, and can be applied to underdetermined cases where the sources outnumber the microphones. This paper proposes a new extension of FCA, aiming to improve BSS performance for mixtures in which the length of reverberation exceeds the analysis frame. There has already been proposed a model that considers delayed source components as the exceeded parts. In contrast, our new extension models multiple time frames with multivariate Gaussian distributions of larger dimensionality than the existing FCA models.

icassp2022AUD16-1.pdf

icassp2022AUD16-1.pdf (396)

slide.pdf

slide.pdf (280)

Categories:: Source Separation and Signal Enhancement

95 Views

Compressed Data Sharing based on Information Bottleneck Model

Read more about Compressed Data Sharing based on Information Bottleneck Model
Log in to post comments

In this paper, we consider privacy-preserving compressed image sharing, where the goal is to release compressed data whilst satisfying some privacy/secrecy constraints yet ensuring image reconstruction with a defined fidelity. The privacy-preserving compressed image sharing is addressed using a machine learning framework based on an information bottleneck with a shared secret key for authorized users. In contrast, an adversary observing the protected compressed representation tries to either reconstruct the data or deduce some privacy-sensitive attributes such as gender, age, etc.

ICASSP2022_Presentation-2.pdf

ICASSP2022_Presentation-2.pdf (292)

Categories:: Applications

8 Views

A NEW COPRIME-ARRAY-BASED CONFIGURATION WITH AUGMENTED DEGREES OF FREEDOM AND REDUCED MUTUAL COUPLING

In this paper, a new type of coprime-array-based structure, named AtCADiS, is proposed to achieve increased degrees of freedom (DoFs) and reduced mutual coupling. The closed-form expressions for the sensor positions and the number of uniform DoFs (uDoFs) of AtCADiS are provided. Specifically, AtCADiS is constructed via two steps. First, we shift the leftmost sensor of tCADiS to the right by N.

AtCADiS_array_poster_ICASSP_2022.pdf

AtCADiS_array_poster_ICASSP_2022.pdf (265)

Categories:: Sensor Array Processing

30 Views

Dynamic Point Cloud Interpolation

Read more about Dynamic Point Cloud Interpolation
Log in to post comments

Dense photorealistic point clouds can depict real-world dynamic objects in high resolution and with a high frame rate. Frame interpolation of such dynamic point clouds would enable the distribution, processing, and compression of such content. In this work, we propose a first point cloud interpolation framework for photorealistic dynamic point clouds. Given two consecutive dynamic point cloud frames, our framework aims to generate intermediate frame(s) between them.

5010_Akhtar_Poster_pdf.pdf

Poster (374)

Categories:: Virtual reality and 3D imaging

50 Views

Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks

Read more about Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Log in to post comments

Representation learning from unlabeled data has been of major interest in artificial intelligence research. While self-supervised speech representation learning has been popular in the speech research community, very few works have comprehensively analyzed audio representation learning for non-speech audio tasks. In this paper, we propose a self-supervised audio representation learning method and apply it to a variety of downstream non-speech audio tasks.

poster_id_3268.pdf

poster_id_3268.pdf (587)

Categories:: Audio and Acoustic Signal Processing

22 Views

SparseBFA: Attacking Sparse Deep Neural Networks with the Worst-Case Bit Flips on Coordinates

poster.pdf

Poster (254)

Categories:: Neural network learning (MLR-NNLR)

17 Views

PixInWav: Residual Steganography for Hiding Pixels in Audio

Read more about PixInWav: Residual Steganography for Hiding Pixels in Audio
Log in to post comments

Steganography comprises the mechanics of hiding data in a host media that may be publicly available. While previous works focused on unimodal setups (e.g., hiding images in images, or hiding audio in audio), PixInWav targets the multimodal case of hiding images in audio. To this end, we propose a novel residual architecture operating on top of short-time discrete cosine transform (STDCT) audio spectrograms. Among our results, we ﬁnd that the residual steganography setup we propose allows an encoding of the hidden image that is independent from the host audio without compromising quality.

_ICASSP_2022PixInWavHidden_Pixels_in_Audio_Spectrograms_Geleta.pdf

PixInWav Paper (346)

Categories:: Watermarking and Steganography

131 Views

ICASSP_POSTER_PAPER_NO_6324

Read more about ICASSP_POSTER_PAPER_NO_6324
Log in to post comments

ICASSP-Poster-Draft_PW.pdf

ICASSP-Poster-Draft_PW.pdf (335)

Categories:: Applications of Sensor Array and Multi-channel Signal Processing

21 Views

Transducer-Based Streaming Deliberation For Cascaded Encoders

Read more about Transducer-Based Streaming Deliberation For Cascaded Encoders
Log in to post comments

Previous research on applying deliberation networks to automatic speech recognition has achieved excellent results. The attention decoder based deliberation model often works as a rescorer to improve first-pass recognition results, and requires the full first-pass hypothesis for second-pass deliberation. In this work, we propose a transducer-based streaming deliberation model. The joint network of a transducer decoder often receives inputs from the encoder and the prediction network. We propose to use attention to the first-pass text hypothesis as the third input to the joint network.