Sorry, you need to enable JavaScript to visit this website.

LOOK, LISTEN AND PAY MORE ATTENTION: FUSING MULTI-MODAL INFORMATION FOR VIDEO VIOLENCE DETECTION

Citation Author(s):
Chen-Geng Liu, Yang Liu, Jing Liu, Xiao-Guang Zhu, Xin-Hua Zeng
Submitted by:
Donglai Wei
Last updated:
5 May 2022 - 9:42am
Document Type:
Poster
Document Year:
2022
Event:
Presenters:
Donglai Wei
Paper Code:
IVMSP-13.3
 

Violence detection is an essential and challenging problem in the computer vision community. Most existing works focus on single modal data analysis, which is not effective when multi-modality is available. Therefore, we propose a twostage multi-modal information fusion method for violence detection: 1) the first stage adopts multiple instance learning strategies to refine video-level hard labels into clip-level soft labels, and 2) the next stage uses multi-modal information fused attention module to achieve fusion, and supervised learning is carried out using the soft labels generated at the first stage. Extensive empirical evidence on the XD-Violence dataset shows that our method outperforms the state-of-theart methods.

up
1 user has voted: Donglai Wei