Sorry, you need to enable JavaScript to visit this website.

This paper presents a novel approach for continuous dynamic hand gesture recognition for RGB video input. Our approach contains two main modules. Firstly, in the gesture spotting module, the video sequence with continuous gestures are pre-segmented into isolated gestures. Secondly, the gesture classification module classifies the segmented gestures. In the gesture spotting module, the motion of the hand palm and finger movements are fed into Bidirectional Long Short-Term Memory (Bi-LSTM) network for gesture spotting purpose.

Categories:
48 Views

Due to the large number and huge diversity of attributes, pedestrian attribute recognition in video surveillance scenarios is a challenging task in the field of computer vision. Different from most previous works which only focus on extremely imbalanced attribute distribution problem, a new grouping way of attributes based multi-task convolutional neural network (MTCNN) is put forward, which exploits the spatial correlations among attributes and guarantees some independence of each attribute as well.

Categories:
41 Views

This paper explores the benefits of 3D face modeling for in-the-wild facial expression recognition (FER). Since there is limited in-the-wild 3D FER dataset, we first construct 3D facial data from available 2D dataset using recent advances in 3D face reconstruction. The 3D facial geometry representation is then extracted by deep learning technique. In addition, we also take advantage of manipulating the 3D face, such as using 2D projected images of 3D face as additional input for FER. These features are then fused with that of 2D FER typical network.

Categories:
80 Views

This paper presents a novel shape descriptor to effectively and efficiently characterize the local image statistics. The proposed descriptor, termed contour covariance (CC), characterizes covariance features driven by a moving point on the shape contour at multiple scales. To calculate the covariance matrices, three basic features including texture, intensity and distance map, are extracted from the object image. Based on coefficients of the obtained covariance matrices, the proposed CC descriptor is compact yet informative, as well as invariant to rotation, translation and scale.

Categories:
22 Views

Rain removal aims to remove the rain streaks on rain images. The state-of-the-art methods are mostly based on Convolutional Neural Network (CNN). However, as CNN is not equivariant to object rotation, these methods are unsuitable for dealing with the tilted rain streaks. To tackle this problem, we propose Deep Symmetry Enhanced Network (DSEN) that is able to explicitly extract the rotation equivariant features from rain images. In addition, we design a self-refining mechanism to remove the accumulated rain streaks in a coarse-to-fine manner.

Categories:
37 Views

Shearlet Transform (ST) is one of the most effective methods for Densely-Sampled Light Field (DSLF) reconstruction from a Sparsely-Sampled Light Field (SSLF). However, ST requires a precise disparity estimation of the SSLF. To this end, in this paper a state-of-the-art optical flow method, i.e. PWC-Net, is employed to estimate bidirectional disparity maps between neighboring views in the SSLF. Moreover, to take full advantage of optical flow and ST for DSLF reconstruction, a novel learning-based method, referred to as Flow-Assisted Shearlet Transform (FAST), is proposed in this paper.

Categories:
45 Views

Smiling has a psychiatric effect in emotional state and may hold tremendous potential for clinical remediation in psychiatric disorders. A few researchers in image synthesis work on acting on the emotional state of subjects by automatically deforming their faces to synthesize joyful expression. However, to generate these expressions they apply the same deformation for the subjects while each person smiles differently. In this paper, we head towards a personalized synthesis of the joy expression.

Categories:
24 Views

Visual tracking is a very important and challenging problem in the field of computer vision. In recent years, Siamese networks have been widely used for visual tracking due to their fast tracking speed, but many trackers based on Siamese network train their networks by utilizing either pairwise loss or triplet loss, which easily leads to over-fitting. In addition, it is difficult to distinguish some hard samples in the training samples. In this paper, we propose a novel global similarity loss to train the network.

Categories:
14 Views

Partial occlusions in face images pose a great problem for most face recognition algorithms due to the fact that most of these algorithms mainly focus on solving a second order loss function, e.g., mean square error (MSE), which will magnify the effect from occlusion parts. In this paper, we proposed a kernel non-second order loss function for sparse representation (KNS-SR) to recognize or restore partially occluded facial images, which both take the advantages of the correntropy and the non-second order statistics measurement.

Categories:
13 Views

Multi-scale object recognition and accurate object localization are two major problems for semantic segmentation in high resolution aerial images. To handle these problems, we design a Context Fuse Module to aggregate multi-scale features and propose an Attention Mix Module to combine different level features for higher localization accuracy. We further employ a Residual Convolutional Module to refine features in all levels. Based on these modules, we construct a new end-to-end network for semantic labeling in aerial images.

Categories:
16 Views

Pages