IEEE ICIP 2025

IEEE ICIP 2025 - The International Conference on Image Processing (ICIP), sponsored by the IEEE Signal Processing Society, is the premier forum for the presentation of technological advances and research results in the fields of theoretical, experimental, and applied image and video processing. ICIP has been held annually since 1994, brings together leading engineers and scientists in image and video processing from around the world. Visit the website.

UTAL-GNN: Unsupervised Temporal Action Localization using Graph Neural Networks

Read more about UTAL-GNN: Unsupervised Temporal Action Localization using Graph Neural Networks
Log in to post comments

Fine-grained action localization in untrimmed sports videos presents a significant challenge due to rapid and subtle motion transitions over short durations. Existing supervised and weakly supervised solutions often rely on extensive annotated datasets and high-capacity models, making them computationally intensive and less adaptable to real-world scenarios. In this work, we introduce a lightweight and unsupervised skeleton-based action localization pipeline that leverages spatio-temporal graph neural representations.

ICIP(SW)_SUPPLEMENTARY.pdf

Additional Ablation Study and Performance Evaluation Results (33)

Categories:: Other

19 Views

Robust Estimation of Bump Height for Wafer-Level Packaging Using Opcital Triangulation

Paper Abstraction:

ICIP2025_Supplementary_Materials_Robust_estimation_of_bump_height_for_wafer_level_packaging_using_optical_triangulation.pdf

ICIP2025_Supplementary_Materials_Robust_estimation_of_bump_height_for_wafer_level_packaging_using_optical_triangulation.pdf (132)

Categories:: Image/Video Processing

22 Views

GIVE: A Multi-Agent Framework for Generating Immersive Multi-Modal Virtual Environments for 3D Games - Supplementary Material

In this work, we present a novel multi-agent framework for generating immersive 3D virtual environments from high-level semantic inputs, powered by large language and vision-language models (LLMs/VLMs). Unlike prior work that focuses primarily on visual output, data-intensive training pipelines, and code generation, our system coordinates a team of specialized agents, each assigned a role such as manager, planner, or expert in visual, audio, or spatial domains, to decompose and execute environment construction tasks within a game engine.

Generative_Immersive_Virtual_Environment_Supplementary_Material.pdf

Supplementary Material (30)

Categories:: Other

59 Views

Texture- and Shape-based Adversarial Attacks for Overhead Image Vehicle Detection

Read more about Texture- and Shape-based Adversarial Attacks for Overhead Image Vehicle Detection
Log in to post comments

Detecting vehicles in aerial images is difficult due to complex backgrounds, small object sizes, shadows, and occlusions. Although recent deep learning advancements have improved object detection, these models remain susceptible to adversarial attacks (AAs), challenging their reliability. Traditional AA strategies often ignore practical implementation constraints. Our work proposes realistic and practical constraints on texture (lowering resolution, limiting modified areas, and color ranges) and analyzes the impact of shape modifications on attack performance.

SM.pdf

SM.pdf (137)

Categories:: Image, Video, and Multidimensional Signal Processing

18 Views

INVESTIGATING ROBUSTNESS OF UNSUPERVISED STYLEGAN IMAGE RESTORATION

Read more about INVESTIGATING ROBUSTNESS OF UNSUPERVISED STYLEGAN IMAGE RESTORATION
Log in to post comments

Recently, generative priors have shown significant improvement for unsupervised image restoration. This study explores the incorporation of multiple loss functions that capture various perceptual and structural aspects of image quality. Our proposed method improves robustness across multiple tasks, including denoising, upsampling, inpainting, and deartifacting, by utilizing a comprehensive loss function based on Learned Perceptual Image Patch Similarity(LPIPS), Multi-Scale Structural Similarity Index Measure Loss(MS-SSIM), Consistency, Feature, and Gradient losses.

Ali_supplementary.pdf

Ali_supplementary.pdf (138)

Categories:: Image/Video Storage, Retrieval

39 Views

(Appendix) Rethinking the Backbone in Class Imbalanced Federated Source Free Domain Adaptation: The Utility of Vision Foundation Models

Appendix of our paper: "Rethinking the Backbone in Class Imbalanced Federated Source Free Domain Adaptation: The Utility of Vision Foundation Models" accepted at IEEE ICIP 2025 workshop: Edge Intelligence: Smart, Efficient, and Scalable Solutions for IoT, Wearables, and Embedded Devices (SEEDS)

appendix.pdf

v1_appendix_submitted (115)

appendix.pdf

v2_appendix_revised (29)

Categories:: Other

65 Views

ICIP 2025 Supplementary

Read more about ICIP 2025 Supplementary
Log in to post comments

This supplementary material accompanies our paper titled "Texturing Endoscopic 3D Stomach via Neural Radiance Field under Uneven Lighting."

ICIP_2025_supplementary.pdf

ICIP_2025_supplementary.pdf (147)

Categories:: Other

36 Views

(Appendix) MultiMAE Meets Earth Observation: Pre-training Multi-modal Multi-task Masked Autoencoders for Earth Observation Tasks

MULTIMAE MEETS EARTH OBSERVATION: PRE-TRAINING MULTI-MODAL MULTI-TASK MASKED AUTOENCODERS FOR EARTH OBSERVATION TASKS (APPENDIX)

appendix_multimae_meets_eo.pdf

Appendix (144)

Categories:: Other

30 Views

Supplementary for ICIP rebuttal

Read more about Supplementary for ICIP rebuttal
Log in to post comments

This paper presents FaceLiVT, a lightweight yet powerful face recognition model that combines a hybrid CNN-Transformer architecture with an innovative and lightweight Multi-Head Linear Attention (MHLA) mechanism. By incorporating MHLA alongside a reparameterized token mixer, FaceLiVT effectively reduces computational complexity while preserving high accuracy. Extensive evaluations on challenging benchmarks—including LFW, CFP-FP, AgeDB-30, IJB-B, and IJB-C—highlight its superior performance compared to state-of-the-art lightweight models.