Image/Video Storage, Retrieval

ICASSP2024-Paper ID 3371-IVMSP-L2.4: VT-REID: LEARNING DISCRIMINATIVE VISUAL-TEXT REPRESENTATION FOR POLYP RE-IDENTIFICATION

Read more about ICASSP2024-Paper ID 3371-IVMSP-L2.4: VT-REID: LEARNING DISCRIMINATIVE VISUAL-TEXT REPRESENTATION FOR POLYP RE-IDENTIFICATION
Log in to post comments

Presention Slides in ICASSP2024 for IVMSP-L2.4: VT-REID: LEARNING DISCRIMINATIVE VISUAL-TEXT REPRESENTATION FOR POLYP RE-IDENTIFICATION

ICASSP2024-Paper ID 3371.pptx

Presentation Slides for ICASSP2024-VT-REID: LEARNING DISCRIMINATIVE VISUAL-TEXT REPRESENTATION FOR POLYP RE-IDENTIFICATION (10)

Categories:: Image/Video Storage, Retrieval

18 Views

PART REPRESENTATION LEARNING WITH TEACHER-STUDENT DECODER FOR OCCLUDED PERSON RE-IDENTIFICATION

Occluded person re-identification (ReID) is a very challenging task due to the occlusion disturbance and incomplete target information. Leveraging external cues such as human pose or parsing to locate and align part features has been proven to be very effective in occluded person ReID. Meanwhile, recent Transformer structures have a strong ability of long-range modeling. Considering the above facts, we propose a Teacher-Student Decoder (TSD) framework for occluded person ReID, which utilizes the Transformer decoder with the help of human parsing.

poster.pdf

poster.pdf (8)

Part_Representation_Learning_with_Teacher-Student_Decoder_for_Occluded_Person_Re-Identification.pdf

Part_Representation_Learning_with_Teacher-Student_Decoder_for_Occluded_Person_Re-Identification.pdf (14)

Categories:: Image/Video Storage, Retrieval

29 Views

MultiWay-Adapter: Adapting Multimodal Large Language Models for scalable image-text retrieval

As Multimodal Large Language Models (MLLMs) grow in size, adapting them to specialized tasks becomes increasingly challenging due to high computational and memory demands. While efficient adaptation methods exist, in practice they suffer from shallow inter-modal alignment, which severely hurts model effectiveness. To tackle these challenges, we introduce the MultiWay-Adapter (MWA), which deepens inter-modal alignment, enabling high transferability with minimal tuning effort.

ICASSP2024_multiway_poster_final.pdf

ICASSP2024_multiway_poster_final.pdf (12)

Categories:: Image/Video Storage, Retrieval

5 Views

Cross-modal Multiscale Difference-aware Network for Joint Moment Retrieval and Highlight Detection

Since the goals of both Moment Retrieval (MR) and Highlight Detection (HD) are to quickly obtain the required content from the video according to user needs, several works have attempted to take advantage of the commonality between both tasks to design transformer-based networks for joint MR and HD. Although these methods achieve impressive performance, they still face some problems: \textbf{a)} Semantic gaps across different modalities. \textbf{b)} Various durations of different query-relevant moments and highlights. \textbf{c)} Smooth transitions among diverse events.

CMDNet_04_13.pdf

Poster (8)

Categories:: Image/Video Processing
Image/Video Storage, Retrieval

10 Views

AEGIS-Net: Attention-Guided Multi-Level Feature Aggregation for Indoor Place Recognition

We present AEGIS-Net, a novel indoor place recognition model that takes in RGB point clouds and generates global place descriptors by aggregating lower-level color, geometry features and higher-level implicit semantic features. However, rather than simple feature concatenation, self-attention modules are employed to select the most important local features that best describe an indoor place. Our AEGIS-Net is made of a semantic encoder, a semantic decoder and an attention-guided feature embedding.

Poster.pdf

Poster.pdf (262)

Categories:: Image/Video Storage, Retrieval

7 Views

EFFICIENT FUSION OF DEPTH INFORMATION FOR DEFOCUS DEBLURRING

Read more about EFFICIENT FUSION OF DEPTH INFORMATION FOR DEFOCUS DEBLURRING
1 comment
Log in to post comments

Defocus deblurring is a classic problem in image restoration tasks. The formation of its defocus blur is related to depth. Recently, the use of dual-pixel sensor designed according to depth-disparity characteristics has brought great improvements to the defocus deblurring task. However, the difficulty of real-time acquisition of dual-pixel images brings difficulties to algorithm deployment. This inspires us to remove defocus blur by single image with depth information.

icassp2024-Efficient_Fusion_of_Depth_Information_for_Defocus_Deblurring.pdf

paper of EFFICIENT FUSION OF DEPTH INFORMATION FOR DEFOCUS DEBLURRING (56)

Categories:: Image/Video Storage, Retrieval

94 Views

SiamCLIM: Text-Based Pedestrian Search via Multi-modal Siamese Contrastive Learning

Read more about SiamCLIM: Text-Based Pedestrian Search via Multi-modal Siamese Contrastive Learning
Log in to post comments

Text-based pedestrian search (TBPS) aims at retrieving target persons from the image gallery through descriptive text queries. Despite remarkable progress in recent state-of-the-art approaches, previous works still struggle to efficiently extract discriminative features from multi-modal data. To address the problem of cross-modal fine-grained text-to-image, we proposed a novel Siamese Contrastive Language-Image Model (SiamCLIM).

SiamCLIM.pptx

SiamCLIM.pptx (57)

Categories:: Image/Video Storage, Retrieval

14 Views

Adaptive Anchor Label Propagation for Transductive Few-Shot Learning

Read more about Adaptive Anchor Label Propagation for Transductive Few-Shot Learning
Log in to post comments

Few-shot learning addresses the issue of classifying images using limited labeled data. Exploiting unlabeled data through the use of transductive inference methods such as label propagation has been shown to improve the performance of few-shot learning significantly. Label propagation infers pseudo-labels for unlabeled data by utilizing a constructed graph that exploits the underlying manifold structure of the data.

1291_slides.pdf

presentation slides (51)

Categories:: Image/Video Storage, Retrieval

25 Views

MABNet: Master Assistant Buddy Network with Hybrid Learning for Image Retrieval

Read more about MABNet: Master Assistant Buddy Network with Hybrid Learning for Image Retrieval
Log in to post comments

Image retrieval has garnered a growing interest in recent times. The current approaches are either supervised or self-supervised. These methods do not exploit the benefits of hybrid learning using both supervision and self-supervision. We present a novel Master Assistant Buddy Network (MABNet) for image retrieval which incorporates both the learning mechanisms. MABNet consists of master and assistant block, both learning independently through supervision and collectively via self-supervision.

MABNET_ICASSP (6).pdf

MABNet (46)

Categories:: Image/Video Storage, Retrieval

8 Views

Invert-and-project (IVP)-A Lossless Compression Method of Multi-scale JPEG Images via DCT Coefficients Prediction

JPEG is a versatile and widely used format for images. Based an elegant design that enables the joint works of basis transformation (gross-scale decorrelation) and entropy coding (fine-scale coding), the resulting JPEG image can maintain virtually all visible features of an image while reducing its size to one tens of the original raw data.

Invert-and-project (IVP)-A Lossless Compression Method of Multi-scale JPEG Images via DCT Coefficients Prediction ppt.pdf

Invert-and-project (IVP)-A Lossless Compression Method of Multi-scale JPEG Images via DCT Coefficients Prediction ppt.pdf (63)

Categories:: Image/Video Storage, Retrieval

32 Views

Image/Video Storage, Retrieval

Pages