Sorry, you need to enable JavaScript to visit this website.

In this paper, we introduce an end-to-end machine learning-based system for classifying autism spectrum disorder (ASD) using facial attributes such as expressions, action units, arousal, and valence. Our system classifies ASD using representations of different facial attributes from convolutional neural networks, which are trained on images in the wild. Our experimental results show that different facial attributes used in our system are statistically significant and improve sensitivity, specificity, and F1 score of ASD classification by a large margin.

Categories:
40 Views

Product placement, also called advertisement embedding, is to place some specific products in an image or a video, which may attract consumers to buy their products. However, adding advertisement objects in images is difficult, because where to add the product and how to fuse the background must be concerned. In this paper, to overcome this issue, we present a novel hierarchical framework with conditional generative adversarial network to add advertisement object in all kinds of scene images. The key point of our framework is leaning the relation between surrounding and products .

Categories:
17 Views

We address the task of estimating depth from a single intensity image via a novel convolutional neural network (CNN) encoder-decoder architecture, which learns the depth information using example pairs of color images and their corresponding depth maps. The proposed model integrates residual connections within pooling and up-sampling layers, and hourglass networks which operate on the encoded features, thus processing these at various scales. Furthermore, the model is optimized under the constraints of perceptual as well as the mean squared error loss.

Categories:
58 Views

This paper presents a novel approach for continuous dynamic hand gesture recognition for RGB video input. Our approach contains two main modules. Firstly, in the gesture spotting module, the video sequence with continuous gestures are pre-segmented into isolated gestures. Secondly, the gesture classification module classifies the segmented gestures. In the gesture spotting module, the motion of the hand palm and finger movements are fed into Bidirectional Long Short-Term Memory (Bi-LSTM) network for gesture spotting purpose.

Categories:
48 Views

Due to the large number and huge diversity of attributes, pedestrian attribute recognition in video surveillance scenarios is a challenging task in the field of computer vision. Different from most previous works which only focus on extremely imbalanced attribute distribution problem, a new grouping way of attributes based multi-task convolutional neural network (MTCNN) is put forward, which exploits the spatial correlations among attributes and guarantees some independence of each attribute as well.

Categories:
41 Views

This paper explores the benefits of 3D face modeling for in-the-wild facial expression recognition (FER). Since there is limited in-the-wild 3D FER dataset, we first construct 3D facial data from available 2D dataset using recent advances in 3D face reconstruction. The 3D facial geometry representation is then extracted by deep learning technique. In addition, we also take advantage of manipulating the 3D face, such as using 2D projected images of 3D face as additional input for FER. These features are then fused with that of 2D FER typical network.

Categories:
80 Views

This paper presents a novel shape descriptor to effectively and efficiently characterize the local image statistics. The proposed descriptor, termed contour covariance (CC), characterizes covariance features driven by a moving point on the shape contour at multiple scales. To calculate the covariance matrices, three basic features including texture, intensity and distance map, are extracted from the object image. Based on coefficients of the obtained covariance matrices, the proposed CC descriptor is compact yet informative, as well as invariant to rotation, translation and scale.

Categories:
24 Views

Rain removal aims to remove the rain streaks on rain images. The state-of-the-art methods are mostly based on Convolutional Neural Network (CNN). However, as CNN is not equivariant to object rotation, these methods are unsuitable for dealing with the tilted rain streaks. To tackle this problem, we propose Deep Symmetry Enhanced Network (DSEN) that is able to explicitly extract the rotation equivariant features from rain images. In addition, we design a self-refining mechanism to remove the accumulated rain streaks in a coarse-to-fine manner.

Categories:
43 Views

Shearlet Transform (ST) is one of the most effective methods for Densely-Sampled Light Field (DSLF) reconstruction from a Sparsely-Sampled Light Field (SSLF). However, ST requires a precise disparity estimation of the SSLF. To this end, in this paper a state-of-the-art optical flow method, i.e. PWC-Net, is employed to estimate bidirectional disparity maps between neighboring views in the SSLF. Moreover, to take full advantage of optical flow and ST for DSLF reconstruction, a novel learning-based method, referred to as Flow-Assisted Shearlet Transform (FAST), is proposed in this paper.

Categories:
50 Views

Smiling has a psychiatric effect in emotional state and may hold tremendous potential for clinical remediation in psychiatric disorders. A few researchers in image synthesis work on acting on the emotional state of subjects by automatically deforming their faces to synthesize joyful expression. However, to generate these expressions they apply the same deformation for the subjects while each person smiles differently. In this paper, we head towards a personalized synthesis of the joy expression.

Categories:
24 Views

Pages