Image, Video, and Multidimensional Signal Processing

Immersive Optical-See-Through Augmented Reality (Keynote Talk)

Read more about Immersive Optical-See-Through Augmented Reality (Keynote Talk)
Log in to post comments

Immersive Optical-See-Through Augmented Reality. Augmented Reality has been getting ready for the last 20 years, and is finally becoming real, powered by progress in enabling technologies such as graphics, vision, sensors, and displays. In this talk I’ll provide a personal retrospective on my journey, working on all those enablers, getting ready for the coming AR revolution. At Meta, we are working on immersive optical-see-through AR headset, as well as the full software stack. We’ll discuss the differences of optical vs.

ICIP_2017_Meta_AR_small.pdf

ICIP_2017_Meta_AR_small.pdf (1144)

Categories:: Image, Video, and Multidimensional Signal Processing

147 Views

Texture- and Shape-based Adversarial Attacks for Overhead Image Vehicle Detection

Read more about Texture- and Shape-based Adversarial Attacks for Overhead Image Vehicle Detection
Log in to post comments

Detecting vehicles in aerial images is difficult due to complex backgrounds, small object sizes, shadows, and occlusions. Although recent deep learning advancements have improved object detection, these models remain susceptible to adversarial attacks (AAs), challenging their reliability. Traditional AA strategies often ignore practical implementation constraints. Our work proposes realistic and practical constraints on texture (lowering resolution, limiting modified areas, and color ranges) and analyzes the impact of shape modifications on attack performance.

SM.pdf

SM.pdf (256)

Categories:: Image, Video, and Multidimensional Signal Processing

30 Views

ICIP2025_supp

Read more about ICIP2025_supp
Log in to post comments

Supplementary materials

supp_main.pdf

supp_main.pdf (254)

Categories:: Image, Video, and Multidimensional Signal Processing

29 Views

ICIP2025_supp

Read more about ICIP2025_supp
Log in to post comments

We address the challenge of local feature matching under large scale and rotation changes by focusing on keypoint positions.
First, we propose a novel module called similarity normalization (SN).
This module normalizes keypoint positions to remove a translation, rotation and scale difference between an image pair.
By performing positional encoding on these normalized positions, a network incorporated with SN can effectively avoid encoding largly different positions into descriptors from the two images.

supp_main.pdf

supp_main.pdf (274)

Categories:: Image, Video, and Multidimensional Signal Processing

26 Views

RetCLIP Case Studies

In this work, we propose a retrieval-based method for improving open vocabulary panoptic segmentation.

ICIP_OVPS_Supplementary.pdf

ICIP_OVPS_Supplementary.pdf (242)

Categories:: Image, Video, and Multidimensional Signal Processing

11 Views

Supplementary Material

Although many deepfake detection methods have been proposed to fight against severe misuse of generative AI, none provide detailed human-interpretable explanations beyond simple real/fake responses. This limitation makes it challenging for humans to assess the accuracy of detection results, especially when the models encounter unseen deepfakes. To address this issue, we propose a novel deepfake detector based on a large Vision-Language Model (VLM), capable of explaining manipulated facial regions.

ICIP25 Paper (1).pdf

ICIP25 Paper (1).pdf (252)

Categories:: Image/Video Processing
Image, Video, and Multidimensional Signal Processing

20 Views

Intriguing Equivalence Structures of the Embedding Space of Vision Transformers

Read more about Intriguing Equivalence Structures of the Embedding Space of Vision Transformers
Log in to post comments

Pre-trained large foundation models play a central role in the recent surge of artificial intelligence, resulting in fine-tuned models with remarkable abilities when measured on benchmark datasets, standard exams, and applications. Due to their inherent complexity, these models are not well understood; in particular, the structures of the representation space are not well characterized despite their fundamental importance. In this paper,

codes-and-additional-results-icip25.zip

codes-and-additional-results-icip25.zip (252)

Categories:: Image, Video, and Multidimensional Signal Processing

17 Views

supplementary material

Read more about supplementary material
Log in to post comments

supplementary material

supplementary_material.pdf

supplementary_material.pdf (284)

Categories:: Image, Video, and Multidimensional Signal Processing

25 Views

ROLLOUT-GUIDED TOKEN PRUNING FOR EFFICIENT VIDEO UNDERSTANDING

Read more about ROLLOUT-GUIDED TOKEN PRUNING FOR EFFICIENT VIDEO UNDERSTANDING
Log in to post comments

Supp for our ICIP submission: ROLLOUT-GUIDED TOKEN PRUNING FOR EFFICIENT VIDEO UNDERSTANDING

RGTP_supp.pdf

RGTP_supp.pdf (261)

Categories:: Image, Video, and Multidimensional Signal Processing

132 Views

Appendix for "Perceptual Classifiers for Detecting Generative Images"

Read more about Appendix for "Perceptual Classifiers for Detecting Generative Images"
Log in to post comments

This 2-page document provides supplementary material for the paper titled "Perceptual Classifiers for Detecting Generative Images". It provides details on the datasets used and their composition. We also include the real and fake detection accuracies for each class to help readers better understand the strengths and drawbacks of the proposed approach. Finally, we provide t-SNE visualizations to understand the effectiveness of the chosen feature extractors.

appendix.pdf

appendix.pdf (158)

Categories:: Image, Video, and Multidimensional Signal Processing

39 Views

Image, Video, and Multidimensional Signal Processing

Pages