Image, Video, and Multidimensional Signal Processing

AEAM3D: ADVERSE ENVIRONMENT-ADAPTIVE MONOCULAR 3D OBJECT DETECTION VIA FEATURE EXTRACTION REGULARIZATION

3D object detection plays a crucial role in intelligent vision systems. Detection in the open world inevitably encounters various adverse scenes while most of existing methods fail in these scenes. To address this issue, this paper proposes a monocular 3D detection model, termed AEAM3D, which effectively mitigates the degradation of detection performance in various harsh environments. Additionally, we assemble a new adverse 3D object detection dataset encompassing some challenging scenes, including rainy, foggy, and low light

poster_icassp2024.pdf

poster (335)

AEAM3D- ADVERSE ENVIRONMENT-ADAPTIVE MONOCULAR 3D OBJECT DETECTION VIA FEATURE EXTRACTION REGULARIZATION.pdf

paper (204)

Categories:: Image, Video, and Multidimensional Signal Processing

81 Views

M3SUM: A Novel Unsupervised Language-guided Video Summarization

Read more about M3SUM: A Novel Unsupervised Language-guided Video Summarization
Log in to post comments

Language-guided video summarization empowers users to use natural language queries to effortlessly summarize lengthy videos into concise and relevant summaries that cater specifically to their information needs, which is more friendly to access and digest. However, most of the previous works rely on tremendous (also expensive) annotated videos and complex designs to align different modals at the feature level.

icassp2024_m3sum.pdf

icassp2024_m3sum.pdf (172)

Categories:: Image, Video, and Multidimensional Signal Processing

469 Views

Style-Driven Multi-Resolution Human Motion Synthesis from Limited Data

Read more about Style-Driven Multi-Resolution Human Motion Synthesis from Limited Data
Log in to post comments

We present a generative model that learns to synthesize human motion from limited training sequences. In contrast to existing methods, our framework provides stylistic control across multiple temporal resolutions. The model adeptly captures human motion patterns by integrating skeletal convolution layers and a multi-scale architecture. Our framework contains a set generative and adversarial networks, along with style embedding modules, each tailored for generating motions at specific frame rates while exerting control over their style.

styles.zip

Example videos of style control. (176)

Categories:: Image, Video, and Multidimensional Signal Processing

25 Views

BMT-BENCH: A Benchmark Sports Dataset for Video Generation

Read more about BMT-BENCH: A Benchmark Sports Dataset for Video Generation
Log in to post comments

This is the supplementary materials for BMT-BENCH dataset for video generation. The material submission includes the links to the dataset and the baseline system

Supplementary Material BMT-BENCH .pdf

Supplementary Material BMT-BENCH .pdf (207)

Categories:: Image, Video, and Multidimensional Signal Processing

32 Views

Supplementary Material for A REAL-WORLD SATELLITE VIDEO SUBJECTIVE QOE DATABASE

Read more about Supplementary Material for A REAL-WORLD SATELLITE VIDEO SUBJECTIVE QOE DATABASE
Log in to post comments

The LIVE-Viasat Real-World Satellite QoE Database is an innovative and comprehensive resource designed to address the critical challenges faced by Internet Service Providers (ISPs), particularly in the domain of satellite streaming services.

supplementary_material.pdf

supplementary_material.pdf (294)

Categories:: Image, Video, and Multidimensional Signal Processing

21 Views

Supplementary Materials

Read more about Supplementary Materials
Log in to post comments

To evaluate the generalization of referring image segmentation (RIS) in the context of human-robot interaction, we generate referring expressions for a subset of images from GraspNet using Shikra.

Supplementary_Materials.pdf

Supplementary_Materials.pdf (267)

Categories:: Image, Video, and Multidimensional Signal Processing

25 Views

Appendix

Read more about Appendix
Log in to post comments

To evaluate the generalization of RIS in the context of human-robot interaction, we generate referring expressions for a subset of images from GraspNet using Shikra.

Appendix.pdf

Appendix.pdf (169)

Categories:: Image, Video, and Multidimensional Signal Processing

10 Views

QVRF: A QUANTIZATION-ERROR-AWARE VARIABLE RATE FRAMEWORK FOR LEARNED IMAGE COMPRESSION

Learned image compression has exhibited promising compression performance, but variable bitrates over a wide range remain a challenge. State-of-the-art variable rate methods compromise the loss of model performance and require numerous additional parameters. In this paper, we present a Quantization-error-aware Variable Rate Framework (QVRF) that utilizes a univariate quantization regulator a to achieve wide-range variable rates within a single model.

eposter_ICIP2023_tongkedeng.pptx

eposter_ICIP2023_tongkedeng.pptx (238)

Categories:: Image, Video, and Multidimensional Signal Processing

77 Views

SEMANTIC-EMBEDDED KNOWLEDGE ACQUISITION AND REASONING FOR IMAGE SEGMENTATION

Read more about SEMANTIC-EMBEDDED KNOWLEDGE ACQUISITION AND REASONING FOR IMAGE SEGMENTATION
Log in to post comments

Image segmentation is a difficult and challenging task because of the complex object appearance and diverse object categories. Traditional methods directly use visual features for segmentation but ignore the correlation between objects. We introduce a knowledge reasoning module (KRM) for external knowledge aggregation and leverage a graphic neural network to aggregate the knowledge feature, which is concatenated with a visual feature for semantic segmentation. To this end, we use word embedding of category names as semantic feature and establish the relationship between categories.

Semantic-embedded knowledge acquisition and reasoning for image segmentation.pdf

semantic segmentation (294)

ICIP_demo.pdf

ICIP_demo.pdf (235)

Categories:: Image, Video, and Multidimensional Signal Processing

37 Views

IMAGE SEGMENTATION FOR IMPROVED LOSSLESS SCREEN CONTENT COMPRESSION

Read more about IMAGE SEGMENTATION FOR IMPROVED LOSSLESS SCREEN CONTENT COMPRESSION
Log in to post comments

In recent years, it has been found that screen content images (SCI) can be effectively compressed based on appropriate probability modelling and suitable entropy coding methods such as arithmetic coding. The key objective is determining the best probability distribution for each pixel position. This strategy works particularly well for images with synthetic (textual) content. However, usually screen content images not only consist of synthetic but also pictorial (natural) regions. These images require diverse models of probability distributions to be optimally compressed.

presentationICASSP_pdf.pdf

presentationICASSP_pdf.pdf (287)

Categories:: Image, Video, and Multidimensional Signal Processing

25 Views

Image, Video, and Multidimensional Signal Processing

Pages