- Image/Video Storage, Retrieval
- Image/Video Processing
- Image/Video Coding
- Image Scanning, Display, and Printing
- Image Formation
- Read more about Leveraging Local Temporal Information For Multimodal Scene Classification
- Log in to post comments
Robust video scene classification models should capture the spatial (pixel-wise) and temporal (frame-wise) characteristics of a video effectively. Transformer models with self-attention which are designed to get contextualized representations for individual tokens given a sequence of tokens, are becoming increasingly popular in many computer vision tasks. However, the use of Transformer based models for video under-standing is still relatively unexplored.
- Categories:
- Read more about Towards Ultra Low Bit-Rate Digital Human Character Communication via Compact 3D Face Descriptors
- Log in to post comments
- Categories:
- Read more about Semantic-based Sentence Recognition in Images Using Bimodal Deep Learning
- 1 comment
- Log in to post comments
- Categories:
- Read more about Semi-Supervised Object Detection with Sparsely Annotated Dataset
- Log in to post comments
When training an anchor-based object detector with a sparsely annotated dataset, the effort required to locate positive examples can cause performance degradation. Because anchor-based object detection models collect positive examples under IoU between anchors and ground-truth bounding boxes, in a sparsely annotated image, some objects that are not annotated can be assigned as negative examples, such as backgrounds.
- Categories:
- Read more about Inverse Halftone Colorization: Making Halftone Prints Color Photos
- Log in to post comments
- Categories:
- Read more about M3VSNet: Unsupervised Multi-metric Multi-view Stereo Network
- Log in to post comments
The present Multi-view stereo (MVS) methods with supervised learning-based networks have an impressive performance comparing with traditional MVS methods. However, the ground-truth depth maps for training are hard to be obtained and are within limited kinds of scenarios. In this paper, we propose a novel unsupervised multi-metric MVS network, named M^3VSNet, for dense point cloud reconstruction without any supervision.
slide1091.pdf
- Categories:
- Read more about SOLVING FOURIER PHASE RETRIEVAL WITH A REFERENCE IMAGE AS A SEQUENCE OF LINEAR INVERSE PROBLEMS
- Log in to post comments
- Categories:
- Read more about Adversarial Unsupervised Video Summarization Augmented with Dictionary Loss
- Log in to post comments
Automated unsupervised video summarization by key-frame extraction consists in identifying representative video frames, best abridging a complete input sequence, and temporally ordering them to form a video summary, without relying on manually constructed ground-truth key-frame sets. State-of-the-art unsupervised deep neural approaches consider the desired summary to be a subset of the original sequence, composed of video frames that are sufficient to visually reconstruct the entire input.
- Categories:
- Read more about INTEGRATED GRAD-CAM: SENSITIVITY-AWARE VISUAL EXPLANATION OF DEEP CONVOLUTIONAL NETWORKS VIA INTEGRATED GRADIENT-BASED SCORING
- Log in to post comments
Visualizing the features captured by Convolutional Neural Networks (CNNs) is one of the conventional approaches to interpret the predictions made by these models in numerous image recognition applications. Grad-CAM is a popular solution that provides such a visualization by combining the activation maps obtained from the model.However, the average gradient-based terms deployed in this method under-estimates the contribution of the representations discovered by the model to its predictions.
- Categories:
- Read more about ADA-SISE: ADAPTIVE SEMANTIC INPUT SAMPLING FOR EFFICIENT EXPLANATION OF CONVOLUTIONAL NEURAL NETWORKS
- Log in to post comments
Explainable AI (XAI) is an active research area to interpret a neural network’s decision by ensuring transparency and trust in the task-specified learned models.Recently,perturbation-based model analysis has shown better interpretation, but back-propagation techniques are still prevailing because of their computational efficiency. In this work, we combine both approaches as a hybrid visual explanation algorithm and propose an efficient interpretation method for convolutional neural networks.
- Categories: