Sorry, you need to enable JavaScript to visit this website.

In this paper, we propose a novel closed-form transformation estimation method based on moving regularized least squares optimization with thin-plate spline (MRLS-TPS) for non-rigid image deformation. The method takes the user-controlled point-offset-vectors as the input data, and estimates the spatial transformation about the two control point sets for each pixel. To achieve a realistic deformation, we formulates the transformation estimation as a vector-field interpolation problem by a moving regularized least squares method.

Categories:
45 Views

Visual speech recognition (VSR), also known as lip reading is a task that recognizes words or phrases using video clips of lip movement. Traditional VSR methods are limited in that they are based mostly on VSR of frontal-view facial movement. However, for practical application, VSR should include lip movement from all angles. In this paper, we propose a pose-invariant network which can recognize words spoken from any arbitrary view input.

Categories:
98 Views

Instead of assuming a closed-world environment comprising a fixed number of objects, modern pattern recognition systems need to recognize outliers, identify anomalies, or discover entirely new objects, which is known as zero-shot object recognition. However, many existing zero-shot learning methods are not efficient enough to incrementally update themselves with new samples mixed with known or novel class labels. In this paper, we propose an incremental zero-shot learning framework (IIAP/QR) based on indirect-attribute-prediction (IAP) model. Firstly, a fast incremental

Categories:
36 Views

This paper proposes a video summarization method based on novel spatio-temporal features that combine motion magnitude, object class prediction, and saturation. Motion magnitude measures how much motion there is in a video. Object class prediction provides information about an object in a video. Saturation measures the colorfulness of a video. Convolutional neural networks (CNNs) are incorporated for object class prediction. The sum of the normalized features per shot are ranked in descending order, and the summary is determined by the highest ranking shots.

Categories:
10 Views

A sound way to localize occluded people is to project the foregrounds from multiple camera views to a reference view by homographies and find the foreground intersections. However, this may give rise to phantoms due to foreground intersections from different people. In this paper, each intersection region is warped back to the original camera view and is associated with a candidate box of the average pedestrians’ size at that location. Then a joint occupancy likelihood is calculated for each intersection region.

Categories:
5 Views

Pages