Sorry, you need to enable JavaScript to visit this website.

We propose a computational framework for ranking images (group photos in particular) taken at the same event within a short time span. The ranking is expected to correspond with human perception of overall appeal of the images. We hypothesize and provide evidence through subjective analysis that the factors that appeal to humans are its emotional content, aesthetics and image quality. We propose a network which is an ensemble of three information channels, each predicting a score corresponding to one of the three visual appeal factors.


Super-Resolution (SR) is a technique that has been exhaustively exploited and incorporates strategic aspects to image processing. As quantum computers gradually evolve and provide unconditional proof of computational advantage at solving intractable problems over their classical counterparts, quantum computing emerges with the compelling prospect to offer exponential speedup to process computationally expensive operations, such as the ones verified in SR imaging.


This paper presents a novel deep Reinforcement Learning (RL)framework for classifying movie scenes based on affect using the face images detected in the video stream as input. Extracting affective information from the video is a challenging task modulating complex visual and temporal representations intertwined with the complex aspects of human perception and information integration. This also makes it difficult to collect a large annotated corpus restricting the use of supervised learning methods.


This paper presents a general framework for model-based 3D face reconstruction from a single image, which can incorporate mature face alignment methods and utilize their properties. In the proposed framework, the final model parameters, i.e., mostly including pose, identity and expression, are achieved by estimating updating the face landmarks and 3D face model parameter alternately. In addition, we propose the parameter augmented regression method (PARM) as an novel derivation of the framework.


In a lot of multi-Kinect V2-based systems, the registration of these Kinect V2 sensors is an important step which directly affects the system precision. The coarse-to-fine method using calibration objects is an effective way to solve the Kinect V2 registration problem. However, for the registration of Kinect V2 cameras with large displacements, this kind of method may fail. To this end, a novel Kinect V2 registration method, which is also based on the coarse-to-fine framework, is proposed by using camera and scene constraints.


This paper presents a novel method to track the hierarchical structure of Web video groups on the basis of salient keyword matching including semantic broadness estimation. To the best of our knowledge, this paper is the first work to perform extraction and tracking of the hierarchical structure simultaneously. Specifically, the proposed method first extracts the hierarchical structure of Web video groups and salient keywords of them on the basis of an improved scheme of our previously reported method.