LOOK, LISTEN AND PAY MORE ATTENTION: FUSING MULTI-MODAL INFORMATION FOR VIDEO VIOLENCE DETECTION

Citation Author(s):: Donglai Wei

Chen-Geng Liu, Yang Liu, Jing Liu, Xiao-Guang Zhu, Xin-Hua Zeng
Submitted by:: Donglai Wei
Last updated:: 5 May 2022 - 9:42am
Document Type:: Poster
Document Year:: 2022
Event:: ICASSP 2022
Presenters:: Donglai Wei
Paper Code:: IVMSP-13.3

Categories:: Image/Video Processing

Violence detection is an essential and challenging problem in the computer vision community. Most existing works focus on single modal data analysis, which is not effective when multi-modality is available. Therefore, we propose a twostage multi-modal information fusion method for violence detection: 1) the first stage adopts multiple instance learning strategies to refine video-level hard labels into clip-level soft labels, and 2) the next stage uses multi-modal information fused attention module to achieve fusion, and supervised learning is carried out using the soft labels generated at the first stage. Extensive empirical evidence on the XD-Violence dataset shows that our method outperforms the state-of-theart methods.

ICASSP2022_Poster .pdf

ICASSP2022_Poster .pdf (186)

Thumbs Up

CITE

Documents

Poster

LOOK, LISTEN AND PAY MORE ATTENTION: FUSING MULTI-MODAL INFORMATION FOR VIDEO VIOLENCE DETECTION

ICASSP2022_Poster .pdf

QUESTIONS?