IEEE ICASSP 2024

IEEE ICASSP 2024 - IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The IEEE ICASSP 2024 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

MLSP-L18.6 presentation

Read more about MLSP-L18.6 presentation
Log in to post comments

Efficient training of large-scale graph neural networks (GNNs) has been studied with a specific focus on reducing their memory consumption. Work by Liu et al. (2022) proposed extreme activation compression (EXACT) which demonstrated drastic reduction in memory consumption by performing quantization of the intermediate activation maps down to using INT2 precision. They showed little to no reduction in performance while achieving large reductions in GPU memory consumption.

ICASSP24___oral.pdf

MLSP-L18.6 presentation (243)

Categories:: Neural network learning (MLR-NNLR)

30 Views

Covariance Matrix Recovery From One-Bit Data With Non-Zero Quantization Thresholds: Algorithm and Performance Analysis

Covariance matrix recovery is a topic of great significance in the field of one-bit signal processing and has numerous practical applications. Despite its importance, the conventional arcsine law with zero threshold is incapable of recovering the diagonal elements of the covariance matrix. To address this limitation, recent studies have proposed the use of non-zero clipping thresholds. However, the relationship between the estimation error and the sampling threshold is not yet known.

presentation_ICASSP2024.pdf

presentation_ICASSP2024.pdf (337)

Categories:: Signal Processing Theory and Methods

29 Views

Binary Signal Alignment: Optimal Solution is Polynomial-time and Linear-time Solution is Quasi-optimal

In this paper we revisit a recently proposed underlay communication scheme which relies on repetition of the secondary signal at the transmitter and canonical correlation analysis CCA at the (multi-antenna) receiver. In this setting, CCA can provably extract the underlay signal in the presence of potentially strong and time-varying primary interference, without any channel knowledge.

ICASSP_2024_presentation.pdf

ICASSP_2024_presentation.pdf (240)

Categories:: Communication Systems and Applications

35 Views

Synthesizing Black-box Anti-forensics DeepFakes with High Visual Quality

Read more about Synthesizing Black-box Anti-forensics DeepFakes with High Visual Quality
Log in to post comments

DeepFake, an AI technology for creating facial forgeries, has garnered global attention. Amid such circumstances, forensics researchers focus on developing defensive algorithms to counter these threats. In contrast, there are techniques developed for enhancing the aggressiveness of DeepFake, e.g., through anti-forensics attacks, to disrupt forensic detectors. However, such attacks often sacrifice image visual quality for improved undetectability. To address this issue, we propose a method to generate novel adversarial sharpening masks for launching black-box anti-forensics attacks.

ynthesizing Black-box Anti-forensics DeepFakes with High Visual Quality.pptx

ppt (245)

Categories:: Multimedia Forensics

39 Views

Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models

Read more about Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models
Log in to post comments

Multi-Source Diffusion Models (MSDM) allow for compositional musical generation tasks: generating a set of coherent sources, creating accompaniments, and performing source separation. Despite their versatility, they require estimating the joint distribution over the sources, necessitating pre-separated musical data, which is rarely available, and fixing the number and type of sources at training time. This paper generalizes MSDM to arbitrary time-domain diffusion models conditioned on text embeddings.

gmsdi.pdf

gmsdi.pdf (186)

Categories:: Machine Learning for Signal Processing

22 Views

Slide for Interpretable Multimodal Out-of-context Detection with Soft Logic Regularization

The rapid spread of information through mobile devices and media has led to the widespread of false or deceptive news, causing significant concerns in society. Among different types of misinformation, image repurposing, also known as out-of-context misinformation, remains highly prevalent and effective. However, current approaches for detecting out-of-context misinformation often lack interpretability and offer limited explanations. In this study, we propose a logic regularization approach for out-of-context detection called LOGRAN (LOGic Regularization for out-of-context ANalysis).

Slide for Interpretable Multimodal Out-of-context Detection with Soft Logic Regularization.pdf

Slide for Interpretable Multimodal Out-of-context Detection with Soft Logic Regularization.pdf (264)

Categories:: Multimedia Forensics

26 Views

CHANNEL-SPATIAL TRANSFORMER FOR EFFICIENT IMAGE SUPER-RESOLUTION

Read more about CHANNEL-SPATIAL TRANSFORMER FOR EFFICIENT IMAGE SUPER-RESOLUTION
Log in to post comments

Transformer has achieved remarkable success in low-level visual tasks, including image super-resolution (SR), owing to its ability to establish global dependencies through self-attention mechanism. However, existing methods overlook the mutual influence and promotion between the channel and spatial dimensions. The feed-forward network (FFN) in the transformer architecture introduces redundant information in the channel during feature extraction, hindering feature representation capability and neglecting spatial information modeling.

poster_1446.pdf

poster (177)

Categories:: Image/Video Processing

34 Views

MULTILINGUAL AUDIO-VISUAL SPEECH RECOGNITION WITH HYBRID CTC/RNN-T FAST CONFORMER

Read more about MULTILINGUAL AUDIO-VISUAL SPEECH RECOGNITION WITH HYBRID CTC/RNN-T FAST CONFORMER
Log in to post comments

Humans are adept at leveraging visual cues from lip movements for recognizing speech in adverse listening conditions. Audio-Visual Speech Recognition (AVSR) models follow similar approach to achieve robust speech recognition in noisy conditions. In this work, we present a multilingual AVSR model incorporating several enhancements to improve performance and audio noise robustness. Notably, we adapt the recently proposed Fast Conformer model to process both audio and visual modalities using a novel hybrid CTC/RNN-T architecture.

SLP-L25.3.pptx

SLP-L25.3.pptx (218)

Categories:: Image, Video, and Multidimensional Signal Processing

56 Views

Crowdsourced Multilingual Speech Intelligibility Testing

Read more about Crowdsourced Multilingual Speech Intelligibility Testing
Log in to post comments

Advancements in generative algorithms promise new heights in what can be achieved, for example, in the speech enhancement domain. Beyond the ubiquitous noise reduction, destroyed speech components can now be restored—something not previously achievable. These emerging advancements create both opportunities and risks, as speech intelligibility can be impacted in a multitude of beneficial and detrimental ways. As such, there exists a need for methods, materials and tools for enabling rapid and effective assessment of speech intelligibility.

ICASSP2024.pdf

ICASSP2024.pdf (214)

Categories:: Speech Enhancement (SPE-ENHA)

42 Views

Slides for Renyi Divergences Learning for explainable classification of SAR Image Pairs

We consider the problem of classifying a pair of Synthetic Aperture Radar (SAR) images by proposing an explainable and frugal algorithm that integrates a set of divergences. The approach relies on a statistical framework that takes standard probability distributions into account for modelling SAR data. Then, by learning a combination of parameterized Renyi divergences and their parameters from the data, we are able to classify the pair of images with fewer parameters than regular machine learning approaches while also allowing an interpretation of the results related to the priors used.

main.pdf

Slides presentation for paper RENYI DIVERGENCES LEARNING FOR EXPLAINABLE CLASSIFICATION OF SAR IMAGE PAIRS (260)

Categories:: Information-theoretic learning (MLR-INFO)
Other

24 Views

IEEE ICASSP 2024

Pages