Sorry, you need to enable JavaScript to visit this website.

Dense photorealistic point clouds can depict real-world dynamic objects in high resolution and with a high frame rate. Frame interpolation of such dynamic point clouds would enable the distribution, processing, and compression of such content. In this work, we propose a first point cloud interpolation framework for photorealistic dynamic point clouds. Given two consecutive dynamic point cloud frames, our framework aims to generate intermediate frame(s) between them.


This paper presents a novel 3DoF+ system that allows to navigate, i.e., change position, in scene-based spatial audio content beyond the sweet spot of a Higher Order Ambisonics recording. It is one of the first such systems based on sound capturing at a single spatial position. The system uses a parametric decomposition of the recorded sound field. For the synthesis, only coarse distance information about the sources is needed as side information but not the exact number of them.


Low latency video streaming of volumetric content is an emerging technology to enable immersive media experiences on mobile devices. Unlike 3DoF scenarios where users are restricted to changes of their head orientation at a single position, volumetric content allows users to move freely within the scene in 6DoF. Although the processing power of mobile devices has increased considerably, streaming volumetric content directly to such devices is still challenging. High-quality volumetric content requires significant data rate and network bandwidth.


Modelling human visual attention is of great importance in the field of computer vision and has been widely explored for 3D imaging. Yet, in the absence of ground truth data, it is unclear whether such predictions are in alignment with the actual human viewing behavior in virtual reality environments. In this study, we work towards solving this problem by conducting an eye-tracking experiment in an immersive 3D scene that offers 6 degrees of freedom. A wide range of static point cloud models is inspected by human subjects, while their gaze is captured in real-time.


This paper proposes BODYFITR, a fully automatic method to fit a human body model to static 3D scans with complex poses. Automatic and reliable 3D human body fitting is necessary for many applications related to healthcare, digital ergonomics, avatar creation and security, especially in industrial contexts for large-scale product design. Existing works either make prior assumptions on the pose, require manual annotation of the data or have difficulty handling complex poses.


Shearlet Transform (ST) is one of the most effective methods for Densely-Sampled Light Field (DSLF) reconstruction from a Sparsely-Sampled Light Field (SSLF). However, ST requires a precise disparity estimation of the SSLF. To this end, in this paper a state-of-the-art optical flow method, i.e. PWC-Net, is employed to estimate bidirectional disparity maps between neighboring views in the SSLF. Moreover, to take full advantage of optical flow and ST for DSLF reconstruction, a novel learning-based method, referred to as Flow-Assisted Shearlet Transform (FAST), is proposed in this paper.


Point cloud segmentation is a key problem of 3D multimedia signal processing. Existing methods usually use single network structure which is trained by per-point loss. These methods mainly focus on geometric similarity between the prediction results and the ground truth, ignoring visual perception difference. In this paper, we present a segmentation adversarial network to overcome the drawbacks above. Discriminator is introduced to provide a perceptual loss to increase the rationality judgment of prediction and guide the further optimization of the segmentator.


Recently, there has been increasing interest in the processing of
dynamic scenes as captured by 3D scanners, ideally suited for
challenging applications such as immersive tele-presence systems
and gaming. Despite the fact that the resolution and accuracy of
the modern 3D scanners are constantly improving, the captured
3D point clouds are usually noisy with a perceptive percentage of
outliers, stressing the need of an approach with low computational
requirements which will be able to automatically remove the outliers


Eye trackers are found on various electronic devices. In this paper, we propose to exploit the gaze information acquired by an eye tracker for depth estimation. The data collected from the eye tracker in a fixation interval are used to estimate the depth of a gazed object. The proposed method can be used to construct a sparse depth map of an augmented reality space. The resulting depth map can be applied to, for example, controlling the visual information displayed to the viewer.