Sorry, you need to enable JavaScript to visit this website.

Interpretable representation learning on natural image datasets via reconstruction in visual-semantic embedding space

Citation Author(s):
Nao Nakagawa, Ren Togo, Takahiro Ogawa, Miki Haseyama
Submitted by:
Nao Nakagawa
Last updated:
27 September 2021 - 11:29pm
Document Type:
Document Year:
Nao Nakagawa
Paper Code:


Unsupervised learning of disentangled representations is a core task for discovering interpretable factors of variation in an image dataset. We propose a novel method that can learn disentangled representations with semantic explanations on natural image datasets. In our method, we guide the representation learning of a variational autoencoder (VAE) via reconstruction in a visual-semantic embedding (VSE) space to leverage the semantic information of image data and explain the learned latent representations in an unsupervised manner. We introduce a semantic sub-encoder and a linear semantic sub-decoder to learn word vectors corresponding to the latent variables to explain factors of variation in the language form. Each basis vector (column) of the linear semantic sub-decoder corresponds to each latent variable, and we can interpret the basis vectors as word vectors indicating the meanings of the latent representations. By introducing the sub-encoder and the sub-decoder, our model can learn latent representations that are not just disentangled but interpretable. Comparing with other state-of-the-art unsupervised disentangled representation learning methods, we observe significant improvements in the disentanglement and the transferability of latent representations.

0 users have voted: