Sorry, you need to enable JavaScript to visit this website.

We present a solution to the problem of discovering all periodic
segments of a video and of estimating their period in
a completely unsupervised manner. These segments may be
located anywhere in the video, may differ in duration, speed,
period and may represent unseen motion patterns of any type
of objects (e.g., humans, animals, machines, etc). The proposed
method capitalizes on earlier research on the problem
of detecting common actions in videos, also known as commonality
detection or video co-segmentation. The proposed


Action quality assessment is crucial in areas of sports, surgery and assembly line where action skills can be evaluated. In this paper, we propose the Segment-based P3D-fused network S3D built-upon ED-TCN and push the performance on the UNLV-Dive dataset by a significant margin. We verify that segment-aware training performs better than full-video training which turns out to focus on the water spray. We show that temporal segmentation can be embedded with few efforts.


Deep neural networks have led to dramatic improvements in performance for many machine learning tasks, yet the mathematical reasons for this success remain largely unclear. In this talk we present recent developments in the mathematical framework of convolutive neural networks (CNN). In particular we discuss the scattering network of Mallat and how it relates to another problem in harmonic analysis, namely the phase retrieval problem. Then we discuss the general convolutive neural network from a theoretician point of view.


IEEE Signal Processing Society welcoming remarks slides from IEEE President, Rabab Ward at ICIP 2017 on 18 September 2017 in Beijing, China.

Accompanying video can be fount on the Resource Center as well as IEEE SPS YouTube:


Extracting spatial-temporal descriptors is a challenging task for video-based human action recognition. We decouple the 3D volume of video frames directly into a cascaded temporal spatial domain via a new convolutional architecture. The motivation behind this design is to achieve deep nonlinear feature representations with reduced network parameters. First, a 1D temporal network with shared parameters is first constructed to map the video sequences along the time axis into feature maps in temporal domain.


In this paper, we investigate the problem of action recognition in RGB-D egocentric videos. These self-generated and embodied videos provide richer semantic cues than the conventional videos captured from the third-person view for action recognition. Moreover, they contain both appearance information and 3D structure of the scenes from the RGB modality and depth modality respectively. Motivated by these advantages,


Dear SigPort Users, do you want to use audio and video to demonstrate your research results? SigPort has a solution.  Sigport now supports PDF documents with audio/video attachments. Please see our sample files:  Sample 1:     Sample 2:     How to add audio and video to your presentation slides and posters?