Sorry, you need to enable JavaScript to visit this website.

CASCADED TEMPORAL SPATIAL FEATURES FOR VIDEO ACTION RECOGNITION

Citation Author(s):
Tingzhao Yu, Huxiang Gu, Lingfeng Wang, Shiming Xiang, Chunhong Pan
Submitted by:
Tingzhao Yu
Last updated:
15 September 2017 - 5:00am
Document Type:
Presentation Slides
Document Year:
2017
Event:
Presenters:
Tingzhao Yu
Paper Code:
1338
Categories:
 

Extracting spatial-temporal descriptors is a challenging task for video-based human action recognition. We decouple the 3D volume of video frames directly into a cascaded temporal spatial domain via a new convolutional architecture. The motivation behind this design is to achieve deep nonlinear feature representations with reduced network parameters. First, a 1D temporal network with shared parameters is first constructed to map the video sequences along the time axis into feature maps in temporal domain. These feature maps are then organized into channels like those of RGB image (named as Motion Image here for abbreviation), which is desired to preserve both temporal and spatial information. Second, the Motion Image
is regarded as the input of the latter cascaded 2D spatial network. With the combination of the 1D temporal network and the 2D spatial network together, the size of whole network parameters is largely reduced. Benefiting from the Motion Image, our network is an end-to-end system for the task of action recognition, which can be trained with the classical algorithm of back propagation. Quantities of comparative experiments on two benchmark datasets demonstrate the effectiveness of our new architecture.

up
0 users have voted: