ICASSP 2023

IEEE ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2023 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Recursive Joint Attention for Audio-Visual Fusion in Regression-Based Emotion Recognition

In video-based emotion recognition (ER), it is important to effectively leverage the complementary relationship among audio (A) and visual (V) modalities, while retaining the intramodal characteristics of individual modalities. In this paper, a recursive joint attention model is proposed along with long short-term memory (LSTM) modules for the fusion of vocal and facial expressions in regression-based ER.

ICASSP2023_Slides.pdf

ICASSP2023_Slides.pdf (125)

Categories:: Image/Video Processing

31 Views

Improved WiFI-based Respiration Tracking via Contrast Enhancement

Read more about Improved WiFI-based Respiration Tracking via Contrast Enhancement
Log in to post comments

Respiratory rate tracking has gained more and more interest in the past few years because of its great potential in exploring different pathological conditions of human beings. Conventional approaches usually require dedicated wearable devices, making them intrusive and unfriendly to users. To tackle the issue, many WiFi-based respiration tracking systems have been proposed because of WiFi’s ubiquity, low-cost, and most importantly, contactlessness. However, most

ICASSP4381_WeiHsiang.pdf

ICASSP4381_WeiHsiang.pdf (207)

Categories:: DSP algorithm implementation in hardware and software

18 Views

QuantPipe: Applying Adaptive Post-Training Quantization for Distributed Transformer Pipelines in Dynamic Edge Environments

Pipeline parallelism has achieved great success in deploying large-scale transformer models in cloud environments, but has received less attention in edge environments. Unlike in cloud scenarios with high-speed and stable network interconnects, dynamic bandwidth in edge systems can degrade distributed pipeline performance. We address this issue withQuantPipe, a communication-efficient distributed edge system that introduces post-training quantization (PTQ) to compress the communicated tensors.

ICASSP Slides.pdf

ICASSP Slides.pdf (290)

Categories:: Communications and Networking

33 Views

Multiscale Audio Spectrogram Transformer for Efficient Audio Classification

Read more about Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
Log in to post comments

Audio event has a hierarchical architecture in both time and frequency and can be grouped together to construct more abstract semantic audio classes. In this work, we develop a multiscale audio spectrogram Transformer (MAST) that employs hierarchical representation learning for efficient audio classification. Specifically, MAST employs one-dimensional (and two-dimensional) pooling operators along the time (and frequency domains) in different stages, and progressively reduces the number of tokens and increases the feature dimensions.

icassp23_multiscale audio transformer.pptx

ICASSP2023 multiscale audio transformer slides (201)

Categories:: Audio for Multimedia

41 Views

InfoShape: Task-Based Neural Data Shaping via Mutual Information

Read more about InfoShape: Task-Based Neural Data Shaping via Mutual Information
Log in to post comments

The use of mutual information as a tool in private data sharing has remained an open challenge due to the difficulty of its estimation in practice. In this paper, we propose InfoShape, a task-based encoder that aims to remove unnecessary sensitive information from training data while maintaining enough relevant information for a particular ML training task. We achieve this goal by utilizing mutual information estimators that are based on neural networks, in order to measure two performance metrics, privacy and utility.

InfoShape_Task-Based_Neural_Data_Shaping_via_Mutual_Information.pdf

InfoShape: Task-Based Neural Data Shaping via Mutual Information (195)

Categories:: Applications
Distributed and Cooperative Learning (MLR-DIST)

8 Views

On Parametric Misspecified Bayesian Cramér-Rao bound: An application to linear/Gaussian systems

C_Tang_MisspecifiedBayesianCRB_ICASSP_2023_poster.pdf

C_Tang_MisspecifiedBayesianCRB_ICASSP_2023_poster.pdf (144)

Categories:: Signal and System Modeling, Representation and Estimation

15 Views

Self-supervised learning of audio representations using angular contrastive loss

Read more about Self-supervised learning of audio representations using angular contrastive loss
Log in to post comments

icassp2023_shanshan_wang_poster.pdf

Poster (167)

Categories:: Content-Based Audio Processing

22 Views

Designing Transformer networks for sparse recovery of sequential data using deep unfolding

Deep unfolding models are designed by unrolling an optimization algorithm into a deep learning network. These models have shown faster convergence and higher performance compared to the original optimization algorithms. Additionally, by incorporating domain knowledge from the optimization algorithm, they need much less training data to learn efficient representations. Current deep unfolding networks for sequential sparse recovery consist of recurrent neural networks (RNNs), which leverage the similarity between consecutive signals.

2023_ICASSP_DUST_camera-ready.pdf

preprint (207)

Categories:: Learning theory and algorithms (MLR-LEAR)
Neural network learning (MLR-NNLR)

26 Views

Mixer: DNN Watermarking using Image Mixup

Read more about Mixer: DNN Watermarking using Image Mixup
Log in to post comments

It is crucial to protect the intellectual property rights of DNN models prior to their deployment. The
DNN should perform two main tasks: its primary task and watermarking task. This paper proposes
a lightweight, reliable, and secure DNN watermarking that attempts to establish strong ties between
these two tasks. The samples triggering the watermarking task are generated using image Mixup
either from training or testing samples. This means that there is an infinity of triggers not limited to the