Documents
Presentation Slides
MULTIGAP: MULTI-POOLED INCEPTION NETWORK WITH TEXT AUGMENTATION FOR AESTHETIC PREDICTION OF PHOTOGRAPHS
- Citation Author(s):
- Submitted by:
- Magzhan Kairanbay
- Last updated:
- 15 September 2017 - 4:38am
- Document Type:
- Presentation Slides
- Document Year:
- 2017
- Event:
- Presenters:
- Magzhan Kairanbay
- Paper Code:
- 3306
- Categories:
- Log in to post comments
With the advent of deep learning, convolutional neural networks have solved many imaging problems to a large extent. However, it remains to be seen if the image “bottleneck” can be unplugged by harnessing complementary sources of data. In this paper, we present a new approach to image aesthetic evaluation that learns both visual and textual features simultaneously. Our network extracts visual features by appending global average pooling blocks on multiple inception modules (MultiGAP), while textual features from associated user comments are learned from a recurrent neural network. Experimental results show that the proposed method is capable of achieving state-of-the-art performance on the AVA / AVA Comments datasets. We also demonstrate the capability of our approach in visualizing aesthetic activations.