Documents
Presentation Slides
AUDIO FEATURE GENERATION FOR MISSING MODALITY PROBLEM IN VIDEO ACTION RECOGNITION
- Citation Author(s):
- Submitted by:
- Hu-Cheng LEE
- Last updated:
- 14 May 2019 - 5:08am
- Document Type:
- Presentation Slides
- Event:
- Presenters:
- HU-CHENG LEE
- Paper Code:
- ICASSP19005
- Categories:
- Log in to post comments
Despite the recent success of multi-modal action recognition in videos, in reality, we usually confront the situation that some data are not available beforehand, especially for multimodal data. For example, while vision and audio data are required to address the multi-modal action recognition, audio tracks in videos are easily lost due to the broken files or the limitation of devices. To cope with this sound-missing problem, we present an approach to simulating deep audio feature from merely spatial-temporal vision data. We demonstrate that adding the simulating sound feature can significantly assist the multi-modal action recognition task. Evaluating our method on the Moments in Time (MIT) Dataset , we show that our proposed method performs favorably against the two-stream architecture, enabling a richer understanding of multi-modal action recognition in video.