- Image/Video Storage, Retrieval
- Image/Video Processing
- Image/Video Coding
- Image Scanning, Display, and Printing
- Image Formation
- Read more about Immersive Optical-See-Through Augmented Reality (Keynote Talk)
- Log in to post comments
Immersive Optical-See-Through Augmented Reality. Augmented Reality has been getting ready for the last 20 years, and is finally becoming real, powered by progress in enabling technologies such as graphics, vision, sensors, and displays. In this talk I’ll provide a personal retrospective on my journey, working on all those enablers, getting ready for the coming AR revolution. At Meta, we are working on immersive optical-see-through AR headset, as well as the full software stack. We’ll discuss the differences of optical vs.
- Categories:
- Read more about Style-Driven Multi-Resolution Human Motion Synthesis from Limited Data
- Log in to post comments
We present a generative model that learns to synthesize human motion from limited training sequences. In contrast to existing methods, our framework provides stylistic control across multiple temporal resolutions. The model adeptly captures human motion patterns by integrating skeletal convolution layers and a multi-scale architecture. Our framework contains a set generative and adversarial networks, along with style embedding modules, each tailored for generating motions at specific frame rates while exerting control over their style.
- Categories:
This is the supplementary materials for BMT-BENCH dataset for video generation. The material submission includes the links to the dataset and the baseline system
- Categories:
- Read more about Supplementary Material for A REAL-WORLD SATELLITE VIDEO SUBJECTIVE QOE DATABASE
- Log in to post comments
The LIVE-Viasat Real-World Satellite QoE Database is an innovative and comprehensive resource designed to address the critical challenges faced by Internet Service Providers (ISPs), particularly in the domain of satellite streaming services.
- Categories:
- Read more about Supplementary Materials
- Log in to post comments
To evaluate the generalization of referring image segmentation (RIS) in the context of human-robot interaction, we generate referring expressions for a subset of images from GraspNet using Shikra.
- Categories:
- Read more about Appendix
- Log in to post comments
To evaluate the generalization of RIS in the context of human-robot interaction, we generate referring expressions for a subset of images from GraspNet using Shikra.
Appendix.pdf
- Categories:
- Read more about QVRF: A QUANTIZATION-ERROR-AWARE VARIABLE RATE FRAMEWORK FOR LEARNED IMAGE COMPRESSION
- Log in to post comments
Learned image compression has exhibited promising compression performance, but variable bitrates over a wide range remain a challenge. State-of-the-art variable rate methods compromise the loss of model performance and require numerous additional parameters. In this paper, we present a Quantization-error-aware Variable Rate Framework (QVRF) that utilizes a univariate quantization regulator a to achieve wide-range variable rates within a single model.
- Categories:
- Read more about SEMANTIC-EMBEDDED KNOWLEDGE ACQUISITION AND REASONING FOR IMAGE SEGMENTATION
- Log in to post comments
Image segmentation is a difficult and challenging task because of the complex object appearance and diverse object categories. Traditional methods directly use visual features for segmentation but ignore the correlation between objects. We introduce a knowledge reasoning module (KRM) for external knowledge aggregation and leverage a graphic neural network to aggregate the knowledge feature, which is concatenated with a visual feature for semantic segmentation. To this end, we use word embedding of category names as semantic feature and establish the relationship between categories.
- Categories:
- Read more about IMAGE SEGMENTATION FOR IMPROVED LOSSLESS SCREEN CONTENT COMPRESSION
- Log in to post comments
In recent years, it has been found that screen content images (SCI) can be effectively compressed based on appropriate probability modelling and suitable entropy coding methods such as arithmetic coding. The key objective is determining the best probability distribution for each pixel position. This strategy works particularly well for images with synthetic (textual) content. However, usually screen content images not only consist of synthetic but also pictorial (natural) regions. These images require diverse models of probability distributions to be optimally compressed.
- Categories:
- Read more about Image Generation is MAY All You Need for VQA
- Log in to post comments
Visual Question Answering (VQA) stands to benefit from the boost of increasingly sophisticated Pretrained Language Model (PLM) and Computer Vision-based models. In particular, many language modality studies have been conducted using image captioning or question generation with the knowledge ground of PLM in terms of data augmentation. However, image generation of VQA has been implemented in a limited way to modify only certain parts of the original image in order to control the quality and uncertainty.
- Categories: