Documents
Presentation Slides
On Negative Sampling for Contrastive Audio-Text Retrieval
- Citation Author(s):
- Submitted by:
- Huang Xie
- Last updated:
- 26 May 2023 - 4:00am
- Document Type:
- Presentation Slides
- Document Year:
- 2023
- Event:
- Presenters:
- Huang Xie
- Paper Code:
- AASP-L3.2
- Categories:
- Keywords:
- Log in to post comments
This paper investigates negative sampling for contrastive learning in the context of audio-text retrieval. The strategy for negative sampling refers to selecting negatives (either audio clips or textual descriptions) from a pool of candidates for a positive audio-text pair. We explore sampling strategies via model-estimated within-modality and cross-modality relevance scores for audio and text samples. With a constant training setting on the retrieval system from [1], we study eight sampling strategies, including hard and semi-hard negative sampling. Experimental results show that retrieval performance varies dramatically among different strategies. Particularly, by selecting semi-hard negatives with cross-modality scores, the retrieval system gains improved performance in both text-to-audio and audio-to-text retrieval. Besides, we show that feature collapse occurs while sampling hard negatives with cross-modality scores.