Sorry, you need to enable JavaScript to visit this website.

MULTI-LEVEL CONTRASTIVE LEARNING FOR HYBRID CROSS-MODAL RETRIEVAL

Citation Author(s):
Submitted by:
Yiming Zhao
Last updated:
11 April 2024 - 10:54am
Document Type:
Poster
Categories:
 

Hybrid image retrieval is a significant task for a wide range of applications. In this scenario, the hybrid query for searching images consists of a reference image and a text modifier. The reference image provides a vital visual context and displays some semantic details, while the text modifier specifies the modifications to the reference image. To address such hybrid cross-modal retrieval, we propose a multi-level contrastive learning (MLCL) method for combining the hybrid query features into a fused feature by cross-modal contrastive learning with multi-level semantic alignment. Meanwhile, we additionally consider self-supervised contrastive learning to enhance the semantic correlation of the features at different levels of the combiner network. Extensive results on three public datasets (i.e., FashionIQ, Shoes, and CIRR) demonstrate that our proposed MLCL significantly outperforms the state-of-the-art methods under the hybrid cross-modal retrieval setting.

up
0 users have voted: