Image/Video Processing

Cross-modal Multiscale Difference-aware Network for Joint Moment Retrieval and Highlight Detection

Since the goals of both Moment Retrieval (MR) and Highlight Detection (HD) are to quickly obtain the required content from the video according to user needs, several works have attempted to take advantage of the commonality between both tasks to design transformer-based networks for joint MR and HD. Although these methods achieve impressive performance, they still face some problems: \textbf{a)} Semantic gaps across different modalities. \textbf{b)} Various durations of different query-relevant moments and highlights. \textbf{c)} Smooth transitions among diverse events.

CMDNet_04_13.pdf

Poster (250)

Categories:: Image/Video Processing
Image/Video Storage, Retrieval

45 Views

PARAMETER-EFFICIENT ADAPTATION FOR COMPUTATIONAL IMAGING

Read more about PARAMETER-EFFICIENT ADAPTATION FOR COMPUTATIONAL IMAGING
Log in to post comments

Deep learning-based methods provide remarkable performance in a number of computational imaging problems. Examples include end-to-end trained networks that map measurements to unknown signals, plug-and-play (PnP) methods that use pretrained denoisers as image prior, and model-based unrolled networks that train artifact removal blocks. Many of these methods lack robustness and fail to generalize with distribution shifts in data, measurements, and noise. In this paper, we present a simple framework to perform domain adaptation as data and measurement distribution shifts.

Parameter-Efficient Adaptation for Computational Imaging-2.pptx

Parameter-Efficient Adaptation for Computational Imaging-2.pptx (148)

Categories:: Image/Video Processing

37 Views

Beyond the Limit of Weight-Sharing: Pioneering Space-Evolving NAS with Large Language Models

Large language models (LLMs) offer impressive performance across diverse fields, but their increasing complexity raises both design costs and the need for specialized expertise. These challenges are intensified for Neural Architecture Search (NAS) methods reliant on weight-sharing techniques. This paper introduces GNAS, a new NAS method that boosts the search process with the aid of LLMs for efficient model discovery. With insights from existing architectures, GNAS swiftly identifies superior models that can adapt to changing resource constraints.

GNAS_20mins.pptx

Presentation Slides (130)

Categories:: Image/Video Processing

24 Views

3D POSE ESTIMATION FROM MONOCULAR VIDEO WITH CAMERA-BONE ANGLE REGULARIZATION ON THE IMAGE FEATURE

In this paper, we propose a monocular 3D pose estimation method which explicitly takes into account the angles between the camera optical axis and bones (camera-bone angles) as well as temporal information. The proposed method combines a 2D-to-3D-based method, which predicts a 3D pose from a sequence of 2D poses, and convolutional neural network (CNN) and includes novel regularization loss to enable the CNN to extract camera-bone-angle information.

20240402_poster_v2.pdf

20240402_poster_v2.pdf (161)

Categories:: Image/Video Processing

36 Views

DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

In this paper, we present the decomposed triplane-hash neural radiance fields (DT-NeRF), a framework that significantly improves the photorealistic rendering of talking faces and achieves state-of-the-art results on key evaluation datasets. Our architecture decomposes the facial region into two specialized triplanes: one specialized for representing the mouth, and the other for the broader facial features. We introduce audio features as residual terms and integrate them as query vectors into our model through an audio-mouthface transformer.

icassp_suyaoyu_poster_1.pdf

icassp_suyaoyu_poster_1.pdf (187)

Categories:: Image/Video Processing

35 Views

DYNAMIC VIDEO FRAME INTERPOLATION WITH INTEGRATED DIFFICULTY PRE-ASSESSMENT

Read more about DYNAMIC VIDEO FRAME INTERPOLATION WITH INTEGRATED DIFFICULTY PRE-ASSESSMENT
Log in to post comments

Video frame interpolation (VFI) has witnessed great progress in recent years. However, existing VFI models still struggle to achieve a good trade-off between accuracy and efficiency. Accurate VFI models typically rely on heavy compute to process all samples, ignoring the fact that easy samples with small motion or clear texture can be well addressed by a fast VFI model and do not require such heavy compute. In this paper, we present a dynamic VFI pipeline with integrated pre-assessment of interpolation difficulty.

lecture.pptx

presentation slide for "DYNAMIC VIDEO FRAME INTERPOLATION WITH INTEGRATED DIFFICULTY PRE-ASSESSMENT" (432)

Categories:: Image/Video Processing

8 Views

Supplementing Missing Visions via Dialog for Scene Graph Generations

Read more about Supplementing Missing Visions via Dialog for Scene Graph Generations
Log in to post comments

Most AI systems rely on the premise that the input visual data are sufficient to achieve competitive performance in various tasks. However, the classic task setup rarely considers the challenging, yet common practical situations where the complete visual data may be inaccessible due to various reasons (e.g., restricted view range and occlusions). To this end, we investigate a task setting with incomplete visual input data. Specifically, we exploit the Scene Graph Generation (SGG) task with various levels of visual data missingness as input.

ICASSP_2024_Poster.pdf

ICASSP_2024_Poster.pdf (228)

Categories:: Image/Video Processing

7 Views

Photovoltaic power forecasting using sky images and sun motion

Read more about Photovoltaic power forecasting using sky images and sun motion
Log in to post comments

Solar energy adoption is moving at a rapid pace. The variability in solar energy production causes grid stability issues and hinders mass adoption. To solve these issues, more accurate photovoltaic power forecasting systems are needed. In intra-hour forecasting, the most challenging issue is high output fluctuations due to cloud motion, which can occlude the sun.

poster_icassp24.pdf

poster_icassp24.pdf (168)

Categories:: Image/Video Processing

18 Views

PROBMCL: SIMPLE PROBABILISTIC CONTRASTIVE LEARNING FOR MULTI-LABEL VISUAL CLASSIFICATION

Multi-label image classification presents a challenging task in many domains, including computer vision and medical imaging. Recent advancements have introduced graph-based and transformer-based methods to improve performance and capture label dependencies. However, these methods often include complex modules that entail heavy computation and lack interpretability. In this paper, we propose Probabilistic Multi-label Contrastive Learning (ProbMCL), a novel framework to address these challenges in multi-label image classification tasks.

ProbMCL_ICASSP24_Poster.pdf

ProbMCL_ICASSP24_Poster.pdf (187)

Categories:: Image/Video Processing
Pattern recognition and classification (MLR-PATT)

24 Views

Self-Supervised Face Image Restoration with a One-Shot Reference

Read more about Self-Supervised Face Image Restoration with a One-Shot Reference
Log in to post comments

For image restoration, methods leveraging priors from generative models have been proposed and demonstrated a promising capacity to robustly restore photorealistic and high-quality results. However, these methods are susceptible to semantic ambiguity, particularly with images that have obviously correct semantics, such as facial images. In this paper, we propose a semantic-aware latent space exploration method for image restoration (SAIR).

ICASSP_SAIR_Slides.pptx

ICASSP_SAIR_Slides.pptx (152)

Categories:: Image/Video Processing

17 Views

Image/Video Processing

Pages