Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

LANGUAGE AND VISUAL RELATIONS ENCODING FOR VISUAL QUESTION ANSWERING

Abstract: 

Visual Question Answering (VQA) involves complex relations of two modalities, including the relations between words and between image regions. Thus, encoding these relations is important to accurate VQA. In this paper, we propose two modules to encode the two types of relations respectively. The language relation encoding module is proposed to encode multi-scale relations between words via a novel masked selfattention. The visual relation encoding module is proposed to encode the relations between image regions. It computes the response at a position as a weighted sum of the features at other positions in the feature maps. Extensive experiments demonstrate the effectiveness of each modules. Our model achieves state-of-the-art performance on the VQA 1.0 dataset.

up
0 users have voted:

Paper Details

Authors:
Jing Liu,Zhiwei Fang,Hanqing Lu
Submitted On:
19 September 2019 - 11:05am
Short Link:
Type:
Poster
Event:
Presenter's Name:
Fei Liu
Document Year:
2019
Cite

Document Files

poster.pdf

(7)

Subscribe

[1] Jing Liu,Zhiwei Fang,Hanqing Lu, "LANGUAGE AND VISUAL RELATIONS ENCODING FOR VISUAL QUESTION ANSWERING", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4740. Accessed: Oct. 20, 2019.
@article{4740-19,
url = {http://sigport.org/4740},
author = {Jing Liu;Zhiwei Fang;Hanqing Lu },
publisher = {IEEE SigPort},
title = {LANGUAGE AND VISUAL RELATIONS ENCODING FOR VISUAL QUESTION ANSWERING},
year = {2019} }
TY - EJOUR
T1 - LANGUAGE AND VISUAL RELATIONS ENCODING FOR VISUAL QUESTION ANSWERING
AU - Jing Liu;Zhiwei Fang;Hanqing Lu
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4740
ER -
Jing Liu,Zhiwei Fang,Hanqing Lu. (2019). LANGUAGE AND VISUAL RELATIONS ENCODING FOR VISUAL QUESTION ANSWERING. IEEE SigPort. http://sigport.org/4740
Jing Liu,Zhiwei Fang,Hanqing Lu, 2019. LANGUAGE AND VISUAL RELATIONS ENCODING FOR VISUAL QUESTION ANSWERING. Available at: http://sigport.org/4740.
Jing Liu,Zhiwei Fang,Hanqing Lu. (2019). "LANGUAGE AND VISUAL RELATIONS ENCODING FOR VISUAL QUESTION ANSWERING." Web.
1. Jing Liu,Zhiwei Fang,Hanqing Lu. LANGUAGE AND VISUAL RELATIONS ENCODING FOR VISUAL QUESTION ANSWERING [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4740