Sorry, you need to enable JavaScript to visit this website.

facebooktwittermailshare

AUDIO-VISUAL FUSION AND CONDITIONING WITH NEURAL NETWORKS FOR EVENT RECOGNITION

Abstract: 

Video event recognition based on audio and visual modalities is an open research problem. The mainstream literature on video event recognition focuses on the visual modality and does not take into account the relevant information present in the audio modality. We propose to study several fusion architectures for the audio-visual recognition task of video events. We first build classical fusion architectures using concatenation, addition or Multimodal Compact Bilinear pooling (MCB). Then, we propose to create connections between visual and audio processing with Feature-Wise Linear Modulation (FiLM) layers. For instance, the information present in the audio modality is exploited to change the visual classification behaviour. We found that multimodal event classification performance is always better than unimodal performance, whatever the fusion or conditioning method used. Classification accuracy based on one modality improves when we add the modulation of the other modality through FiLM layers.

up
0 users have voted:

Paper Details

Authors:
Jean Rouat, Stéphane Dupont
Submitted On:
14 October 2019 - 8:52pm
Short Link:
Type:
Presentation Slides
Event:
Presenter's Name:
Mathilde Brousmiche
Paper Code:
60
Document Year:
2019
Cite

Document Files

MLSP_presentation.pdf

(15)

Subscribe

[1] Jean Rouat, Stéphane Dupont, "AUDIO-VISUAL FUSION AND CONDITIONING WITH NEURAL NETWORKS FOR EVENT RECOGNITION", IEEE SigPort, 2019. [Online]. Available: http://sigport.org/4873. Accessed: Nov. 15, 2019.
@article{4873-19,
url = {http://sigport.org/4873},
author = {Jean Rouat; Stéphane Dupont },
publisher = {IEEE SigPort},
title = {AUDIO-VISUAL FUSION AND CONDITIONING WITH NEURAL NETWORKS FOR EVENT RECOGNITION},
year = {2019} }
TY - EJOUR
T1 - AUDIO-VISUAL FUSION AND CONDITIONING WITH NEURAL NETWORKS FOR EVENT RECOGNITION
AU - Jean Rouat; Stéphane Dupont
PY - 2019
PB - IEEE SigPort
UR - http://sigport.org/4873
ER -
Jean Rouat, Stéphane Dupont. (2019). AUDIO-VISUAL FUSION AND CONDITIONING WITH NEURAL NETWORKS FOR EVENT RECOGNITION. IEEE SigPort. http://sigport.org/4873
Jean Rouat, Stéphane Dupont, 2019. AUDIO-VISUAL FUSION AND CONDITIONING WITH NEURAL NETWORKS FOR EVENT RECOGNITION. Available at: http://sigport.org/4873.
Jean Rouat, Stéphane Dupont. (2019). "AUDIO-VISUAL FUSION AND CONDITIONING WITH NEURAL NETWORKS FOR EVENT RECOGNITION." Web.
1. Jean Rouat, Stéphane Dupont. AUDIO-VISUAL FUSION AND CONDITIONING WITH NEURAL NETWORKS FOR EVENT RECOGNITION [Internet]. IEEE SigPort; 2019. Available from : http://sigport.org/4873