Sorry, you need to enable JavaScript to visit this website.

Multi-modality based adult video detection is an effective approach of filtering pornography. However, existing methods lack accurate representation methods of multi-modality semantics. Addressing at the issue, we propose a novel method of bimodal codebooks based adult video detection. Firstly, the audio codebook is created by periodicity analysis from the labeled audio segments. Secondly, the visual codebook is generated by detecting regions-of-interest (ROI) on the basis of saliency analysis.


Sentiment analysis is attracting more and more attentions and has become a very hot research topic due to its potential applications in personalized recommendation, opinion mining, etc. Most of the existing methods are based on either textual or visual data and can not achieve satisfactory results, as it is very hard to extract sufficient information from only one single modality data.


(1)We improved the soft attention model by introducing convolutional
operations inside the LSTM cell and attention map generation process to capture the spatial layout.
(2)We built a hierarchical two layer LSTM model for action recognition. (3)We tested our model on
three widely applied datasets, the UCF sports dataset, the Olympic dataset and the HMDB51 dataset
with improved results on other published work.


The inherent dependencies among multiple physiological signals are crucial for multimodal emotion recognition, but have not been thoroughly exploited yet. This paper propose to use restricted Boltzmann machine (RBM) to model such dependencies.Specifically, the visible nodes of RBM represent EEG and peripheral physiological signals, and thus the connections between visible nodes and hidden nodes capture the intrinsic relations among multiple physiological signals. The RBM also generate new representation from multiple physiological signals.


The inherent dependencies among video content, personal characteristics, and perceptual emotion are crucial for personalized video emotion tagging, but have not been thoroughly exploited. To address this, we propose a novel topic model to capture such inherent dependencies. We assume that there are several potential human factors, or “topics,” that affect the personal characteristics and the personalized emotion responses to videos.


Natural and affective handshakes of two participants define the course of dyadic interaction. Affective states of the participants are expected to be correlated with the nature of the dyadic interaction. In this paper, we extract two classes of the dyadic interaction based on temporal clustering of affective states. We use the k-means temporal clustering to define the interaction classes, and utilize support vector machine based classifier to estimate the interaction class types from multimodal, speech and motion, features.


Sound event detection is the task of detecting the type, starting time, and ending time of sound events in audio streams. Recently, recurrent neural networks (RNNs) have become the mainstream solution for sound event detection. Because RNNs make a prediction at every frame, it is necessary to provide exact starting and ending times of the sound events in the training data, making data annotation an extremely time-consuming process.


There is a big challenge in online multi-object tracking-by-detection, which caused by frequent occlusions, false alarms or miss detections and other factors. In this paper, we pro-posed an improved fast online multi-object tracking method through taking into account the results of multiple single-object trackers and detections synthetically. To solve the fixed scale problem of conventional kernelized correlation filter in single-object tracker we used, trackers are associated with de-tections based on position and size and then an adaptive mech-anism of trackers is established.


This paper presents a novel method to track the hierarchical structure of Web video groups on the basis of salient keyword matching including semantic broadness estimation. To the best of our knowledge, this paper is the first work to perform extraction and tracking of the hierarchical structure simultaneously. Specifically, the proposed method first extracts the hierarchical structure of Web video groups and salient keywords of them on the basis of an improved scheme of our previously reported method.