Sorry, you need to enable JavaScript to visit this website.

When it comes to wild conditions, Facial Expression Recognition is often challenged with low-quality data and imbalanced, ambiguous labels. This field has much benefited from CNN based approaches; however, CNN models have structural limitations to see the facial regions in distance. As a remedy, Transformer has been introduced to vision fields with a global receptive field but requires adjusting input spatial size to the pretrained models to enjoy its strong inductive bias at hands.

Categories:
10 Views

RGB-D semantic segmentation is attracting wide attention due to its better performance than conventional RGB methods. However, most of RGB-D semantic segmentation methods need to acquire the real depth information for segmenting RGB images effectively. Therefore, it is extremely challenging to take full advantage of RGB-D semantic segmentation methods for segmenting RGB images without the depth input.

Categories:
14 Views

Video instance segmentation (VIS) task requires classifying, segmenting, and tracking object instances over all frames in a video clip. Recently, VisTR \cite{vistr} has been proposed as end-to-end transformer-based VIS framework, while demonstrating state-of-the-art performance. However, VisTR is slow to converge during training, requiring around 1000 GPU hours due to the high computational cost of its transformer attention module.

Categories:
10 Views

Deep object detectors suffer from the gradient contribution imbalance during training. In this paper, we point out that such imbalance can be ascribed to the imbalance in example attributes, e.g., difficulty and shape variation degree. We further propose example attribute based prediction modulation (EAPM) to address it. In EAPM, first, the attribute of an example is defined by the prediction and the corresponding ground truth. Then, a modulating factor w.r.t the example attribute is introduced to modulate the prediction error.

Categories:
34 Views

Spatial-temporal local binary pattern (STLBP) has been widely used in dynamic texture recognition. STLBP often encounters the high-dimension problem as its dimension increases exponentially, so that STLBP could only utilize a small neighborhood. To tackle this problem, we propose a method for dynamic texture recognition using PDV hashing and dictionary learning on multi-scale volume local binary pattern (PHD-MVLBP).

Categories:
13 Views

Pages