- Read more about MTA: A Lightweight Multilingual Text Alignment Model for Cross-language Visual Word Sense Disambiguation
- Log in to post comments
Visual Word Sense Disambiguation (Visual-WSD), as a subtask of fine-grained image-text retrieval, requires a high level of language-vision understanding to capture and exploit the nuanced relationships between text and visual features. However, the cross-linguistic background only with limited contextual information is considered the most significant challenges for this task.
- Categories:
25 Views