
- Read more about Interpretable representation learning on natural image datasets via reconstruction in visual-semantic embedding space
- Log in to post comments
Unsupervised learning of disentangled representations is a core task for discovering interpretable factors of variation in an image dataset. We propose a novel method that can learn disentangled representations with semantic explanations on natural image datasets. In our method, we guide the representation learning of a variational autoencoder (VAE) via reconstruction in a visual-semantic embedding (VSE) space to leverage the semantic information of image data and explain the learned latent representations in an unsupervised manner.
- Categories:

- Read more about Semantic Role Aware Correlation Transformer For Text To Video Retrieval
- Log in to post comments
With the emergence of social media, voluminous video clips are uploaded every day, and retrieving the most relevant visual content with a language query becomes critical. Most approaches aim to learn a joint embedding space for plain textual and visual contents without adequately exploiting their intra-modality structures and inter-modality correlations.
- Categories:

- Read more about Semantic-Preserving Metric Learning for Video-Text Retrieval (Poster)
- Log in to post comments
- Categories:

- Read more about CHANNEL SHUFFLE RECONSTRUCTION NETWORK FOR IMAGE COMPRESSIVE SENSING
- Log in to post comments
- Categories:

- Read more about LIGHTWEIGHT IMAGE SUPER-RESOLUTION RECONSTRUCTION WITH HIERARCHICAL FEATURE-DRIVEN NETWORK
- Log in to post comments
- Categories:

- Read more about DEEP SMOOTHED PROJECTED LANDWEBER NETWORK FOR BLOCK-BASED IMAGE COMPRESSIVE SENSING
- Log in to post comments
- Categories:

Video classification can be performed by summarizing image contents of individual frames into one class by deep neural networks, e.g., CNN and LSTM. Human interpretation of video content is influenced by the attention mechanism. In other words, video class can be more attentively decided by certain information than others. In this paper, we propose to integrate the attention mechanism into deep networks for video classification.
- Categories:


High-dimensional data structures, known as tensors, are fundamental in many applications, including multispectral imaging and color video processing. Compression of such huge amount of multidimensional data collected over time is of paramount importance, necessitating the process of quantization of measurements into discrete values. Furthermore, noise and issues related to the acquisition and transmission of signals frequently lead to unobserved, lost or corrupted measurements.
- Categories: