AUDIO FEATURE GENERATION FOR MISSING MODALITY PROBLEM IN VIDEO ACTION RECOGNITION

Despite the recent success of multi-modal action recognition in videos, in reality, we usually confront the situation that some data are not available beforehand, especially for multimodal data. For example, while vision and audio data are required to address the multi-modal action recognition, audio tracks in videos are easily lost due to the broken ﬁles or the limitation of devices. To cope with this sound-missing problem, we present an approach to simulating deep audio feature from merely spatial-temporal vision data. We demonstrate that adding the simulating sound feature can signiﬁcantly assist the multi-modal action recognition task. Evaluating our method on the Moments in Time (MIT) Dataset , we show that our proposed method performs favorably against the two-stream architecture, enabling a richer understanding of multi-modal action recognition in video.

20190516_AUDIO_FEATURE_GENERATION_FOR_MISSING_MODALITY_PROBLEM_IN_VIDEO_ACTION_RECOGNITION.pptx

20190516_AUDIO_FEATURE_GENERATION_FOR_MISSING_MODALITY_PROBLEM_IN_VIDEO_ACTION_RECOGNITION.pptx (383)

Thumbs Up

CITE

Documents

Presentation Slides

AUDIO FEATURE GENERATION FOR MISSING MODALITY PROBLEM IN VIDEO ACTION RECOGNITION

20190516_AUDIO_FEATURE_GENERATION_FOR_MISSING_MODALITY_PROBLEM_IN_VIDEO_ACTION_RECOGNITION.pptx

QUESTIONS?