Documents
Poster
MULTI-LEVEL CONTRASTIVE LEARNING FOR HYBRID CROSS-MODAL RETRIEVAL
- DOI:
- 10.60864/3j14-6y84
- Citation Author(s):
- Submitted by:
- Yiming Zhao
- Last updated:
- 6 June 2024 - 10:50am
- Document Type:
- Poster
- Categories:
- Log in to post comments
Hybrid image retrieval is a significant task for a wide range of applications. In this scenario, the hybrid query for searching images consists of a reference image and a text modifier. The reference image provides a vital visual context and displays some semantic details, while the text modifier specifies the modifications to the reference image. To address such hybrid cross-modal retrieval, we propose a multi-level contrastive learning (MLCL) method for combining the hybrid query features into a fused feature by cross-modal contrastive learning with multi-level semantic alignment. Meanwhile, we additionally consider self-supervised contrastive learning to enhance the semantic correlation of the features at different levels of the combiner network. Extensive results on three public datasets (i.e., FashionIQ, Shoes, and CIRR) demonstrate that our proposed MLCL significantly outperforms the state-of-the-art methods under the hybrid cross-modal retrieval setting.