Sorry, you need to enable JavaScript to visit this website.

SHOW, TRANSLATE AND TELL

Citation Author(s):
Dheeraj Peri, Shagan Sah, Raymond Ptucha
Submitted by:
Raymond Ptucha
Last updated:
20 September 2019 - 7:51pm
Document Type:
Poster
Document Year:
2019
Event:
Presenters:
Ray Ptudcha
Paper Code:
2914
 

Humans have an incredible ability to process and understand
information from multiple sources such as images,
video, text, and speech. Recent success of deep neural
networks has enabled us to develop algorithms which give
machines the ability to understand and interpret this information.
There is a need to both broaden their applicability and
develop methods which correlate visual information along
with semantic content. We propose a unified model which
jointly trains on images and captions, and learns to generate
new captions given either an image or a caption query.
We evaluate our model on three different tasks namely crossmodal
retrieval, image captioning, and sentence paraphrasing.
Our model gains insight into cross-modal vector embeddings,
generalizes well on multiple tasks and is competitive to state
of the art methods on retrieval.

up
0 users have voted: