- Image/Video Storage, Retrieval
- Image/Video Processing
- Image/Video Coding
- Image Scanning, Display, and Printing
- Image Formation

- Read more about DOMAIN-WISE INVARIANT LEARNING FOR PANOPTIC SCENE GRAPH GENERATION
- Log in to post comments
Panoptic Scene Graph Generation (PSG) involves the detection of objects and the prediction of their corresponding relationships (predicates). However, the presence of biased predicate annotations poses a significant challenge for PSG models, as it hinders their ability to establish a clear decision boundary among different predicates. This issue substantially impedes the practical utility and real-world applicability of PSG models.
- Categories:

- Read more about AEAM3D: ADVERSE ENVIRONMENT-ADAPTIVE MONOCULAR 3D OBJECT DETECTION VIA FEATURE EXTRACTION REGULARIZATION
- Log in to post comments
3D object detection plays a crucial role in intelligent vision systems. Detection in the open world inevitably encounters various adverse scenes while most of existing methods fail in these scenes. To address this issue, this paper proposes a monocular 3D detection model, termed AEAM3D, which effectively mitigates the degradation of detection performance in various harsh environments. Additionally, we assemble a new adverse 3D object detection dataset encompassing some challenging scenes, including rainy, foggy, and low light
- Categories:

- Read more about M3SUM: A Novel Unsupervised Language-guided Video Summarization
- Log in to post comments
Language-guided video summarization empowers users to use natural language queries to effortlessly summarize lengthy videos into concise and relevant summaries that cater specifically to their information needs, which is more friendly to access and digest. However, most of the previous works rely on tremendous (also expensive) annotated videos and complex designs to align different modals at the feature level.
- Categories:

- Read more about Style-Driven Multi-Resolution Human Motion Synthesis from Limited Data
- Log in to post comments
We present a generative model that learns to synthesize human motion from limited training sequences. In contrast to existing methods, our framework provides stylistic control across multiple temporal resolutions. The model adeptly captures human motion patterns by integrating skeletal convolution layers and a multi-scale architecture. Our framework contains a set generative and adversarial networks, along with style embedding modules, each tailored for generating motions at specific frame rates while exerting control over their style.
- Categories:

This is the supplementary materials for BMT-BENCH dataset for video generation. The material submission includes the links to the dataset and the baseline system
- Categories:

- Read more about Supplementary Material for A REAL-WORLD SATELLITE VIDEO SUBJECTIVE QOE DATABASE
- Log in to post comments
The LIVE-Viasat Real-World Satellite QoE Database is an innovative and comprehensive resource designed to address the critical challenges faced by Internet Service Providers (ISPs), particularly in the domain of satellite streaming services.
- Categories:

- Read more about Supplementary Materials
- Log in to post comments
To evaluate the generalization of referring image segmentation (RIS) in the context of human-robot interaction, we generate referring expressions for a subset of images from GraspNet using Shikra.
- Categories:

- Read more about Appendix
- Log in to post comments
To evaluate the generalization of RIS in the context of human-robot interaction, we generate referring expressions for a subset of images from GraspNet using Shikra.
Appendix.pdf

- Categories:

- Read more about QVRF: A QUANTIZATION-ERROR-AWARE VARIABLE RATE FRAMEWORK FOR LEARNED IMAGE COMPRESSION
- Log in to post comments
Learned image compression has exhibited promising compression performance, but variable bitrates over a wide range remain a challenge. State-of-the-art variable rate methods compromise the loss of model performance and require numerous additional parameters. In this paper, we present a Quantization-error-aware Variable Rate Framework (QVRF) that utilizes a univariate quantization regulator a to achieve wide-range variable rates within a single model.
- Categories:

- Read more about SEMANTIC-EMBEDDED KNOWLEDGE ACQUISITION AND REASONING FOR IMAGE SEGMENTATION
- Log in to post comments
Image segmentation is a difficult and challenging task because of the complex object appearance and diverse object categories. Traditional methods directly use visual features for segmentation but ignore the correlation between objects. We introduce a knowledge reasoning module (KRM) for external knowledge aggregation and leverage a graphic neural network to aggregate the knowledge feature, which is concatenated with a visual feature for semantic segmentation. To this end, we use word embedding of category names as semantic feature and establish the relationship between categories.
- Categories: