Sorry, you need to enable JavaScript to visit this website.

Active speaker detection (ASD) and virtual cinematography (VC) can significantly improve the remote user experience of a video conference by automatically panning, tilting and zooming of a video conferencing camera: users subjectively rate an expert video cinematographer’s video significantly higher than unedited video. We describe a new automated ASD and VC that performs within 0.3 MOS of an expert cinematographer based on subjective ratings with a 1-5 scale.


Reconstructing a signal corrupted by impulsive noise is of high importance in several applications, including impulsive noise removal from images, audios and videos, and separating texts from images. Investigating this problem, in this paper we propose a new method to reconstruct a noise-corrupted signal where both signal and noise are sparse but in different domains. We apply our algorithm for impulsive noise (Salt-and-Pepper Noise (SPN) and Random-Valued Impulsive Noise (RVIN) removal from images and compare our results with other notable algorithms in the literature.


In this work we explore an overcomplete representation of
multiview imagery for the purpose of compression. We
present a rate-distortion (R-D) driven approach to decompose
multiview datasets into two additive parts which can
be interpreted as being the diffuse and specular components.
We apply different transforms to each component such that
the compressibility of input data is improved. We describe
a framework which performs the R-D optimized separation
in a registered domain to avoid the complexity of warping


In order to predict where humans look in a 3D immersive en- vironment, saliency can be computed using either 3D saliency models or view-based approaches (2D projection). In fact, building a 3D complete model is still a challenging task that is not investigated enough in the research field while 2D imag- ing approaches have been extensively studied and have shown solid performances.


Analysis of hand skeleton data can be used to understand patterns in manipulation and assembly tasks. This paper introduces a graphbased representation of hand skeleton data and proposes a method to perform unsupervised temporal segmentation of a sequence of subtasks in order to evaluate the efficiency of an assembly task. We explore the properties of different choices of hand graphs and their spectral decomposition. A comparative performance of these graphs is presented in the context of complex activity segmentation.