Action Recognition In RGB-D Egocentric Videos

Yansong Tang, Yi Tian, Jiwen Lu, Jianjiang Feng, Jie Zhou
Yansong Tang
6 September 2017 - 9:35am
Yansong Tang
In this paper, we investigate the problem of action recognition in RGB-D egocentric videos. These self-generated and embodied videos provide richer semantic cues than the conventional videos captured from the third-person view for action recognition. Moreover, they contain both appearance information and 3D structure of the scenes from the RGB modality and depth modality respectively. Motivated by these advantages,
we first collect a video-based RGB-D egocentric dataset (THU-READ) with diverse types of daily-life actions. Then we evaluate several approaches including hand-crafted features and deep learning methods on THU-READ. To improve the performance, we further develop a tri-stream convolutional network (TCNet) method, which learns to exploit the fuse with both the RGB and depth modalities for action recognition.
Experimental results show that our model achieves competitive performance with state-of-the-art methods.

