ICASSP 2023

IEEE ICASSP 2023 - IEEE International Conference on Acoustics, Speech and Signal Processing is the world’s largest and most comprehensive technical conference focused on signal processing and its applications. The ICASSP 2023 conference will feature world-class presentations by internationally renowned speakers, cutting-edge session topics and provide a fantastic opportunity to network with like-minded professionals from around the world. Visit the website.

Image Generation is MAY All You Need for VQA

Read more about Image Generation is MAY All You Need for VQA
Log in to post comments

Visual Question Answering (VQA) stands to benefit from the boost of increasingly sophisticated Pretrained Language Model (PLM) and Computer Vision-based models. In particular, many language modality studies have been conducted using image captioning or question generation with the knowledge ground of PLM in terms of data augmentation. However, image generation of VQA has been implemented in a limited way to modify only certain parts of the original image in order to control the quality and uncertainty.

Image_Generation_is_May_All_You_Need_for_VQA.pdf

Image_Generation_is_May_All_You_Need_for_VQA.pdf (321)

Categories:: Image, Video, and Multidimensional Signal Processing

95 Views

Hypernetwork-based Adaptive Image Restoration

Read more about Hypernetwork-based Adaptive Image Restoration
1 comment
Log in to post comments

Adaptive image restoration models can restore images with different degradation levels at inference time without the need to retrain the model. We present an approach that is highly accurate and allows a significant reduction in the number of parameters. In contrast to existing methods, our approach can restore images using a single fixed-size model, regardless of the number of degradation levels. On popular datasets, our approach yields state-of-the-art results in terms of size and accuracy for a variety of image restoration tasks, including denoising, deJPEG, and super-resolution.

ICASSP_poster.pdf

ICASSP_poster.pdf (230)

Categories:: Image/Video Processing

74 Views

Pondering about Task Spatial Misalignment: Classification-Localization Equilibrated Object Detection

Object detection is a fundamental task in computer vision, consisting of both classification and localization tasks. Previous works mostly perform classification and localization with shared feature extractor like Convolution Neural Network. However, the tasks of classification and localization exhibit different sensitivities with regard to the same feature, hence the "task spatial misalignment" issue. This issue can result in a hedge issue between the performances of localizer and classifier.

icassp2023_poster.pdf

Poster of the paper titled Pondering about Task Spatial Misalignment: Classification-Localization Equilibrated Object Detection (191)

Categories:: Image/Video Processing

27 Views

GaitMixer: Skeleton-based Gait Representation Learning via Wide-spectrum Multi-axial Mixer

Most existing gait recognition methods are appearance-based, which rely on the silhouettes extracted from the video data of human walking activities. The less-investigated skeleton-based gait recognition methods directly learn the gait dynamics from 2D/3D human skeleton sequences, which are theoretically more robust solutions in the presence of appearance changes caused by clothes, hairstyles, and carrying objects. However, the performance of skeleton-based solutions is still largely behind the appearance-based ones.

2210.15491.pdf

GaitMixer Paper [pdf] (218)

Categories:: Image/Video Processing

34 Views

SPATIAL INFERENCE USING CENSORED MULTIPLE TESTING WITH FDR CONTROL

Read more about SPATIAL INFERENCE USING CENSORED MULTIPLE TESTING WITH FDR CONTROL
Log in to post comments

A wireless sensor network performs spatial inference on a physical phenomenon of interest. The areas in which this phenomenon exhibits interesting or anomalous behavior are identified whilst controlling false positives. We expand our previous work based on multiple hypothesis testing (MHT) and local false discovery rates to save energy and reduce spectrum use. The number of transmissions from sensors producing uninformative statistics are reduced by introducing censoring for MHT that imposes a communication rate constraint while maintaining the desired performance.

video_slides_v2_1.pdf

video_slides_v2_1.pdf (294)

Categories:: Statistical Signal Processing

76 Views

Pages