Image, Video, and Multidimensional Signal Processing

Immersive Optical-See-Through Augmented Reality (Keynote Talk)

Read more about Immersive Optical-See-Through Augmented Reality (Keynote Talk)
Log in to post comments

Immersive Optical-See-Through Augmented Reality. Augmented Reality has been getting ready for the last 20 years, and is finally becoming real, powered by progress in enabling technologies such as graphics, vision, sensors, and displays. In this talk I’ll provide a personal retrospective on my journey, working on all those enablers, getting ready for the coming AR revolution. At Meta, we are working on immersive optical-see-through AR headset, as well as the full software stack. We’ll discuss the differences of optical vs.

ICIP_2017_Meta_AR_small.pdf

ICIP_2017_Meta_AR_small.pdf (1057)

Categories:: Image, Video, and Multidimensional Signal Processing

141 Views

Texture- and Shape-based Adversarial Attacks for Overhead Image Vehicle Detection

Read more about Texture- and Shape-based Adversarial Attacks for Overhead Image Vehicle Detection
Log in to post comments

Detecting vehicles in aerial images is difficult due to complex backgrounds, small object sizes, shadows, and occlusions. Although recent deep learning advancements have improved object detection, these models remain susceptible to adversarial attacks (AAs), challenging their reliability. Traditional AA strategies often ignore practical implementation constraints. Our work proposes realistic and practical constraints on texture (lowering resolution, limiting modified areas, and color ranges) and analyzes the impact of shape modifications on attack performance.

SM.pdf

SM.pdf (141)

Categories:: Image, Video, and Multidimensional Signal Processing

18 Views

Supplementary Material for Rebuttal

Read more about Supplementary Material for Rebuttal
1 comment
Log in to post comments

Supplementary material for rebuttal

ICIP_2025__supplementary.pdf

ICIP_2025__supplementary.pdf (144)

Categories:: Image, Video, and Multidimensional Signal Processing

18 Views

ICIP2025_supp

Read more about ICIP2025_supp
Log in to post comments

Supplementary materials

supp_main.pdf

supp_main.pdf (174)

Categories:: Image, Video, and Multidimensional Signal Processing

23 Views

ICIP2025_supp

Read more about ICIP2025_supp
Log in to post comments

We address the challenge of local feature matching under large scale and rotation changes by focusing on keypoint positions.
First, we propose a novel module called similarity normalization (SN).
This module normalizes keypoint positions to remove a translation, rotation and scale difference between an image pair.
By performing positional encoding on these normalized positions, a network incorporated with SN can effectively avoid encoding largly different positions into descriptors from the two images.

supp_main.pdf

supp_main.pdf (167)

Categories:: Image, Video, and Multidimensional Signal Processing

12 Views

RetCLIP Case Studies

In this work, we propose a retrieval-based method for improving open vocabulary panoptic segmentation.

ICIP_OVPS_Supplementary.pdf

ICIP_OVPS_Supplementary.pdf (147)

Categories:: Image, Video, and Multidimensional Signal Processing

6 Views

Supplementary Material

Although many deepfake detection methods have been proposed to fight against severe misuse of generative AI, none provide detailed human-interpretable explanations beyond simple real/fake responses. This limitation makes it challenging for humans to assess the accuracy of detection results, especially when the models encounter unseen deepfakes. To address this issue, we propose a novel deepfake detector based on a large Vision-Language Model (VLM), capable of explaining manipulated facial regions.

ICIP25 Paper (1).pdf

ICIP25 Paper (1).pdf (157)

Categories:: Image/Video Processing
Image, Video, and Multidimensional Signal Processing

19 Views

Intriguing Equivalence Structures of the Embedding Space of Vision Transformers

Read more about Intriguing Equivalence Structures of the Embedding Space of Vision Transformers
Log in to post comments

Pre-trained large foundation models play a central role in the recent surge of artificial intelligence, resulting in fine-tuned models with remarkable abilities when measured on benchmark datasets, standard exams, and applications. Due to their inherent complexity, these models are not well understood; in particular, the structures of the representation space are not well characterized despite their fundamental importance. In this paper,