Sorry, you need to enable JavaScript to visit this website.

Video post-processing is a method to improve the quality of reconstructed frames at the
decoder side. Although the existing post-processing algorithms based on deep learning
can achieve signicant quality improvement compared with traditional methods, they will
require a lot of computational resources, which makes these algorithms difficult to use
on mobile devices. To tackle this problem, a low-complexity neural network based on
max-pooling and depth-wise separable convolution is proposed in this work for compressed


Image retargeting changes the aspect ratio of images while aiming to preserve content and minimise noticeable distortion. Fast and high-quality methods are particularly relevant at present, due to the large variety of image and display aspect ratios. We propose a retargeting method that quantifies and limits warping distortions with the use of content-aware cropping. The pipeline of the proposed approach consists of the following steps. First, an importance map of a source image is generated using deep semantic segmentation and saliency detection models.


Action recognition in top-view 360° videos is an emerging research topic in computer vision. Existing work utilizes a global projection method to transform 360° video frames to panorama frames for further processing. However, this unwrapping suffers from a problem of geometric distortion i.e., people present near the centre in the 360° video frames appear highly stretched and distorted in the corresponding panorama frames (observed in 37.5% of the total panorama frames in 360Action dataset).


High dynamic range (HDR) image formation from low dynamic range (LDR) images of different exposures is a well researched topic in the past two decades.
However, most of the developed techniques consider differently exposed LDR images that are acquired from the same camera view point, which assumes the scene to be static long enough to capture multiple images.
In this paper, we propose to address the problem of HDR imaging from differently exposed LDR stereo images using an encoder-decoder based convolutional neural network (CNN).


In this paper we address the problem of jointly retrieving a 3D dynamic shape, camera motion, and deformation grouping from partial 2D point trajectories in a monocular video. To this end, we introduce a union of piecewise Bézier subspaces with enforcing continuities to model 3D motion. We show that formulating the problem in terms of piecewise curves, allows for a better physical interpretation of the resulting priors and a more accurate representation of the motion.


Deep convolutional neural networks (CNNs), renowned for their consistent performance, are widely understood by practitioners that the stability of learning depends on the initialization of the model parameters in each layer. Kaiming initialization, the de facto standard, is derived from a much simpler CNN model which consists of only the convolution and fully connected layers. Compared to the current CNN models, the basis CNN model for the Kaiming initialization does not include the max pooling or global average pooling layers.