Documents
Poster
Poster
LOOK, LISTEN AND PAY MORE ATTENTION: FUSING MULTI-MODAL INFORMATION FOR VIDEO VIOLENCE DETECTION
- Citation Author(s):
- Submitted by:
- Donglai Wei
- Last updated:
- 5 May 2022 - 9:42am
- Document Type:
- Poster
- Document Year:
- 2022
- Event:
- Presenters:
- Donglai Wei
- Paper Code:
- IVMSP-13.3
- Categories:
- Log in to post comments
Violence detection is an essential and challenging problem in the computer vision community. Most existing works focus on single modal data analysis, which is not effective when multi-modality is available. Therefore, we propose a twostage multi-modal information fusion method for violence detection: 1) the first stage adopts multiple instance learning strategies to refine video-level hard labels into clip-level soft labels, and 2) the next stage uses multi-modal information fused attention module to achieve fusion, and supervised learning is carried out using the soft labels generated at the first stage. Extensive empirical evidence on the XD-Violence dataset shows that our method outperforms the state-of-theart methods.