Image/Video Storage, Retrieval

INVESTIGATING ROBUSTNESS OF UNSUPERVISED STYLEGAN IMAGE RESTORATION

Read more about INVESTIGATING ROBUSTNESS OF UNSUPERVISED STYLEGAN IMAGE RESTORATION
Log in to post comments

Recently, generative priors have shown significant improvement for unsupervised image restoration. This study explores the incorporation of multiple loss functions that capture various perceptual and structural aspects of image quality. Our proposed method improves robustness across multiple tasks, including denoising, upsampling, inpainting, and deartifacting, by utilizing a comprehensive loss function based on Learned Perceptual Image Patch Similarity(LPIPS), Multi-Scale Structural Similarity Index Measure Loss(MS-SSIM), Consistency, Feature, and Gradient losses.

Ali_supplementary.pdf

Ali_supplementary.pdf (266)

Categories:: Image/Video Storage, Retrieval

56 Views

Supplementary Material

Read more about Supplementary Material
Log in to post comments

HOW SHOULD WE EVALUATE DATA DELETION IN GRAPH-BASED ANN INDEXES ?

supplementary_material.pdf

supplementary_material.pdf (210)

Categories:: Image/Video Storage, Retrieval

52 Views

Supplementary - Towards Image Copy Detection at E-commerce Scale

Read more about Supplementary - Towards Image Copy Detection at E-commerce Scale
Log in to post comments

Copy Detection system aims to identify if a query image is an edited/manipulated copy of an image from a large reference database with millions of images. While global image descriptors can retrieve visually similar images, they struggle to differentiate near-duplicates from semantically similar instances. We propose a dual-triplet metric learning (DTML) technique to learn global image features that group near-duplicates closer than visually similar images while maintaining the semantic structure of the embedding space.

DID_ICIP_2025_short_suppl.pdf

DID_ICIP_2025_short_suppl.pdf (257)

Categories:: Image/Video Storage, Retrieval

28 Views

ICIP2025_1674_supplementary-material_Multi_Res_3DGS

Read more about ICIP2025_1674_supplementary-material_Multi_Res_3DGS
Log in to post comments

ICIP2025_1674_supplementary-material_Multi_Res_3DGS

ICIP2025_1674_camera_ready_supplementary-material_Multi_Res_3DGS.pdf

Multi-Res-3DGS (184)

Categories:: Image/Video Storage, Retrieval

36 Views

TEMPORAL TRANSFORMER ENCODER FOR VIDEO CLASS INCREMENTAL LEARNING

Read more about TEMPORAL TRANSFORMER ENCODER FOR VIDEO CLASS INCREMENTAL LEARNING
Log in to post comments

Current video classification approaches suffer from catastrophic forgetting when they are retrained on new databases.
Continual learning aims to enable a classification system with learning from a succession of tasks without forgetting.
In this paper we propose to use a transformer-based video class incremental learning model. During a succession of
learning steps, at each training time, the transformer is used to extract characteristic spatio-temporal features from videos

TEMPORAL TRANSFORMER ENCODER FOR VIDEO CLASS INCREMENTAL LEARNING POSTER.pdf

TEMPORAL TRANSFORMER ENCODER FOR VIDEO CLASS INCREMENTAL LEARNING POSTER.pdf (259)

Categories:: Image/Video Storage, Retrieval

24 Views

ICASSP2024-Paper ID 3371-IVMSP-L2.4: VT-REID: LEARNING DISCRIMINATIVE VISUAL-TEXT REPRESENTATION FOR POLYP RE-IDENTIFICATION

Presention Slides in ICASSP2024 for IVMSP-L2.4: VT-REID: LEARNING DISCRIMINATIVE VISUAL-TEXT REPRESENTATION FOR POLYP RE-IDENTIFICATION

ICASSP2024-Paper ID 3371.pptx

Presentation Slides for ICASSP2024-VT-REID: LEARNING DISCRIMINATIVE VISUAL-TEXT REPRESENTATION FOR POLYP RE-IDENTIFICATION (262)

Categories:: Image/Video Storage, Retrieval

37 Views

PART REPRESENTATION LEARNING WITH TEACHER-STUDENT DECODER FOR OCCLUDED PERSON RE-IDENTIFICATION

Occluded person re-identification (ReID) is a very challenging task due to the occlusion disturbance and incomplete target information. Leveraging external cues such as human pose or parsing to locate and align part features has been proven to be very effective in occluded person ReID. Meanwhile, recent Transformer structures have a strong ability of long-range modeling. Considering the above facts, we propose a Teacher-Student Decoder (TSD) framework for occluded person ReID, which utilizes the Transformer decoder with the help of human parsing.

poster.pdf

poster.pdf (274)

Part_Representation_Learning_with_Teacher-Student_Decoder_for_Occluded_Person_Re-Identification.pdf

Part_Representation_Learning_with_Teacher-Student_Decoder_for_Occluded_Person_Re-Identification.pdf (266)

Categories:: Image/Video Storage, Retrieval

100 Views

MultiWay-Adapter: Adapting Multimodal Large Language Models for scalable image-text retrieval

As Multimodal Large Language Models (MLLMs) grow in size, adapting them to specialized tasks becomes increasingly challenging due to high computational and memory demands. While efficient adaptation methods exist, in practice they suffer from shallow inter-modal alignment, which severely hurts model effectiveness. To tackle these challenges, we introduce the MultiWay-Adapter (MWA), which deepens inter-modal alignment, enabling high transferability with minimal tuning effort.

ICASSP2024_multiway_poster_final.pdf

ICASSP2024_multiway_poster_final.pdf (257)

Categories:: Image/Video Storage, Retrieval

29 Views

Cross-modal Multiscale Difference-aware Network for Joint Moment Retrieval and Highlight Detection

Since the goals of both Moment Retrieval (MR) and Highlight Detection (HD) are to quickly obtain the required content from the video according to user needs, several works have attempted to take advantage of the commonality between both tasks to design transformer-based networks for joint MR and HD. Although these methods achieve impressive performance, they still face some problems: \textbf{a)} Semantic gaps across different modalities. \textbf{b)} Various durations of different query-relevant moments and highlights. \textbf{c)} Smooth transitions among diverse events.

CMDNet_04_13.pdf

Poster (271)

Categories:: Image/Video Processing
Image/Video Storage, Retrieval

48 Views

AEGIS-Net: Attention-Guided Multi-Level Feature Aggregation for Indoor Place Recognition

We present AEGIS-Net, a novel indoor place recognition model that takes in RGB point clouds and generates global place descriptors by aggregating lower-level color, geometry features and higher-level implicit semantic features. However, rather than simple feature concatenation, self-attention modules are employed to select the most important local features that best describe an indoor place. Our AEGIS-Net is made of a semantic encoder, a semantic decoder and an attention-guided feature embedding.

Poster.pdf

Poster.pdf (545)

Categories:: Image/Video Storage, Retrieval

28 Views

Image/Video Storage, Retrieval

Pages