Sorry, you need to enable JavaScript to visit this website.

AUDIO FEATURE GENERATION FOR MISSING MODALITY PROBLEM IN VIDEO ACTION RECOGNITION

Citation Author(s):
Hu-Cheng Lee, Chih-Yu Lin, Pin-Chun Hsu, Winston H. Hsu
Submitted by:
Hu-Cheng LEE
Last updated:
14 May 2019 - 5:08am
Document Type:
Presentation Slides
Event:
Presenters:
HU-CHENG LEE
Paper Code:
ICASSP19005
 

Despite the recent success of multi-modal action recognition in videos, in reality, we usually confront the situation that some data are not available beforehand, especially for multimodal data. For example, while vision and audio data are required to address the multi-modal action recognition, audio tracks in videos are easily lost due to the broken files or the limitation of devices. To cope with this sound-missing problem, we present an approach to simulating deep audio feature from merely spatial-temporal vision data. We demonstrate that adding the simulating sound feature can significantly assist the multi-modal action recognition task. Evaluating our method on the Moments in Time (MIT) Dataset , we show that our proposed method performs favorably against the two-stream architecture, enabling a richer understanding of multi-modal action recognition in video.

up
0 users have voted: