Sorry, you need to enable JavaScript to visit this website.

Several computer vision applications such as person search or online fashion rely on human description. The use of instance-level human parsing (HP) is therefore relevant since it localizes semantic attributes and body parts within a person. But how to characterize these attributes? To our knowledge, only some single-HP datasets describe attributes with some color, size and/or pattern characteristics. There is a lack of dataset for multi-HP in the wild with such characteristics.

Categories:
8 Views

Unsupervised learning of disentangled representations is a core task for discovering interpretable factors of variation in an image dataset. We propose a novel method that can learn disentangled representations with semantic explanations on natural image datasets. In our method, we guide the representation learning of a variational autoencoder (VAE) via reconstruction in a visual-semantic embedding (VSE) space to leverage the semantic information of image data and explain the learned latent representations in an unsupervised manner.

Categories:
16 Views

With the emergence of social media, voluminous video clips are uploaded every day, and retrieving the most relevant visual content with a language query becomes critical. Most approaches aim to learn a joint embedding space for plain textual and visual contents without adequately exploiting their intra-modality structures and inter-modality correlations.

Categories:
3 Views

Video classification can be performed by summarizing image contents of individual frames into one class by deep neural networks, e.g., CNN and LSTM. Human interpretation of video content is influenced by the attention mechanism. In other words, video class can be more attentively decided by certain information than others. In this paper, we propose to integrate the attention mechanism into deep networks for video classification.

Categories:
24 Views

Pages