- Read more about MULTI-VIEW VISUAL SPEECH RECOGNITION BASED ON MULTI TASK LEARNING
- Log in to post comments
Visual speech recognition (VSR), also known as lip reading is a task that recognizes words or phrases using video clips of lip movement. Traditional VSR methods are limited in that they are based mostly on VSR of frontal-view facial movement. However, for practical application, VSR should include lip movement from all angles. In this paper, we propose a pose-invariant network which can recognize words spoken from any arbitrary view input.
- Categories:
- Read more about LONG-TERM OBJECT TRACKING BASED ON SIAMESE NETWORK
- Log in to post comments
- Categories:

- Read more about ICIP2017_Incremental zero-shot learning based on attributes for image classification
- Log in to post comments
Instead of assuming a closed-world environment comprising a fixed number of objects, modern pattern recognition systems need to recognize outliers, identify anomalies, or discover entirely new objects, which is known as zero-shot object recognition. However, many existing zero-shot learning methods are not efficient enough to incrementally update themselves with new samples mixed with known or novel class labels. In this paper, we propose an incremental zero-shot learning framework (IIAP/QR) based on indirect-attribute-prediction (IAP) model. Firstly, a fast incremental
- Categories:
- Read more about ICIP 2017-ROBUST ELLIPSE DETECTION VIA ARC SEGMENTATION AND CLASSIFICATION
- Log in to post comments
- Categories:

- Read more about CONTENT ADAPTIVE VIDEO SUMMARIZATION USING SPATIO-TEMPORAL FEATURES
- Log in to post comments
This paper proposes a video summarization method based on novel spatio-temporal features that combine motion magnitude, object class prediction, and saturation. Motion magnitude measures how much motion there is in a video. Object class prediction provides information about an object in a video. Saturation measures the colorfulness of a video. Convolutional neural networks (CNNs) are incorporated for object class prediction. The sum of the normalized features per shot are ranked in descending order, and the summary is determined by the highest ranking shots.
- Categories:
- Read more about Multiview Pedestrian Localisation via a Prime Candidate Chart
- Log in to post comments
A sound way to localize occluded people is to project the foregrounds from multiple camera views to a reference view by homographies and find the foreground intersections. However, this may give rise to phantoms due to foreground intersections from different people. In this paper, each intersection region is warped back to the original camera view and is associated with a candidate box of the average pedestrians’ size at that location. Then a joint occupancy likelihood is calculated for each intersection region.
- Categories:
- Read more about Hierarchical Bilinear Network for High Performance Face Detection
- Log in to post comments
HBN_lv.pdf

- Categories:
- Read more about Learning Optimal Parameters for Binary Sensing Image Reconstruction Algorithms
- Log in to post comments
A novel data-driven reconstruction algorithm for quantum image sensors (QIS) is proposed. Observations are efficiently decoded by modeling the reconstruction structure as a two-layer neural network, where optimal coefficients are obtained via error backpropagation. Our model encapsulates the structure of state-of-the-art algorithms, yet it presents a faster alternative which adapts to input examples without a priori statistical information.
- Categories:
- Read more about FSVO: SEMI-DIRECT MONOCULAR VISUAL ODOMETRY USING FIXED MAPS
- Log in to post comments
We propose a fixed-map semi-direct visual odometry (FSVO) algorithm for Micro Aerial Vehicles (MAVs). The proposed approach does not need computationally expensive feature extraction and matching techniques for motion estimation at each frame. Instead, we extract and match ORiented Brief (ORB) features between keyframes and assist-frames. We replace the incremental map generation step in traditional algorithms with fixed map generation at keyframe and assist- frame only in our algorithm, resulting in reduced storage memory and higher flexibility for relocalization.
- Categories: