Sorry, you need to enable JavaScript to visit this website.

Several computer vision applications such as person search or online fashion rely on human description. The use of instance-level human parsing (HP) is therefore relevant since it localizes semantic attributes and body parts within a person. But how to characterize these attributes? To our knowledge, only some single-HP datasets describe attributes with some color, size and/or pattern characteristics. There is a lack of dataset for multi-HP in the wild with such characteristics.

Categories:
36 Views

Unsupervised learning of disentangled representations is a core task for discovering interpretable factors of variation in an image dataset. We propose a novel method that can learn disentangled representations with semantic explanations on natural image datasets. In our method, we guide the representation learning of a variational autoencoder (VAE) via reconstruction in a visual-semantic embedding (VSE) space to leverage the semantic information of image data and explain the learned latent representations in an unsupervised manner.

Categories:
48 Views

With the emergence of social media, voluminous video clips are uploaded every day, and retrieving the most relevant visual content with a language query becomes critical. Most approaches aim to learn a joint embedding space for plain textual and visual contents without adequately exploiting their intra-modality structures and inter-modality correlations.

Categories:
11 Views

Pages