- Read more about 3D POSE ESTIMATION FROM MONOCULAR VIDEO WITH CAMERA-BONE ANGLE REGULARIZATION ON THE IMAGE FEATURE
- Log in to post comments
In this paper, we propose a monocular 3D pose estimation method which explicitly takes into account the angles between the camera optical axis and bones (camera-bone angles) as well as temporal information. The proposed method combines a 2D-to-3D-based method, which predicts a 3D pose from a sequence of 2D poses, and convolutional neural network (CNN) and includes novel regularization loss to enable the CNN to extract camera-bone-angle information.
- Categories:
- Read more about DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis
- Log in to post comments
In this paper, we present the decomposed triplane-hash neural radiance fields (DT-NeRF), a framework that significantly improves the photorealistic rendering of talking faces and achieves state-of-the-art results on key evaluation datasets. Our architecture decomposes the facial region into two specialized triplanes: one specialized for representing the mouth, and the other for the broader facial features. We introduce audio features as residual terms and integrate them as query vectors into our model through an audio-mouthface transformer.
- Categories:
- Read more about DYNAMIC VIDEO FRAME INTERPOLATION WITH INTEGRATED DIFFICULTY PRE-ASSESSMENT
- Log in to post comments
Video frame interpolation (VFI) has witnessed great progress in recent years. However, existing VFI models still struggle to achieve a good trade-off between accuracy and efficiency. Accurate VFI models typically rely on heavy compute to process all samples, ignoring the fact that easy samples with small motion or clear texture can be well addressed by a fast VFI model and do not require such heavy compute. In this paper, we present a dynamic VFI pipeline with integrated pre-assessment of interpolation difficulty.
- Categories:
- Read more about Supplementing Missing Visions via Dialog for Scene Graph Generations
- Log in to post comments
Most AI systems rely on the premise that the input visual data are sufficient to achieve competitive performance in various tasks. However, the classic task setup rarely considers the challenging, yet common practical situations where the complete visual data may be inaccessible due to various reasons (e.g., restricted view range and occlusions). To this end, we investigate a task setting with incomplete visual input data. Specifically, we exploit the Scene Graph Generation (SGG) task with various levels of visual data missingness as input.
- Categories:
- Read more about Photovoltaic power forecasting using sky images and sun motion
- Log in to post comments
Solar energy adoption is moving at a rapid pace. The variability in solar energy production causes grid stability issues and hinders mass adoption. To solve these issues, more accurate photovoltaic power forecasting systems are needed. In intra-hour forecasting, the most challenging issue is high output fluctuations due to cloud motion, which can occlude the sun.
- Categories:
- Read more about PROBMCL: SIMPLE PROBABILISTIC CONTRASTIVE LEARNING FOR MULTI-LABEL VISUAL CLASSIFICATION
- Log in to post comments
Multi-label image classification presents a challenging task in many domains, including computer vision and medical imaging. Recent advancements have introduced graph-based and transformer-based methods to improve performance and capture label dependencies. However, these methods often include complex modules that entail heavy computation and lack interpretability. In this paper, we propose Probabilistic Multi-label Contrastive Learning (ProbMCL), a novel framework to address these challenges in multi-label image classification tasks.
- Categories:
- Read more about Self-Supervised Face Image Restoration with a One-Shot Reference
- Log in to post comments
For image restoration, methods leveraging priors from generative models have been proposed and demonstrated a promising capacity to robustly restore photorealistic and high-quality results. However, these methods are susceptible to semantic ambiguity, particularly with images that have obviously correct semantics, such as facial images. In this paper, we propose a semantic-aware latent space exploration method for image restoration (SAIR).
- Categories:
- Read more about Spectro-spatial hyperspectral image reconstruction from interferometric acquisitions
- Log in to post comments
In the last decade, novel hyperspectral cameras have been developed with particularly desirable characteristics of compactness and short acquisition time, retaining their potential to obtain spectral/spatial resolution competitive with respect to traditional cameras. However, a computational effort is required to recover an interpretable data cube.
poster.pdf
- Categories:
- Read more about RESIDUAL DENSE SWIN TRANSFORMER FOR CONTINUOUS DEPTH-INDEPENDENT ULTRASOUND IMAGING
- Log in to post comments
Ultrasound imaging is crucial for evaluating organ morphology and function, yet depth adjustment can degrade image quality and field-of-view, presenting a depth-dependent dilemma. Traditional interpolation-based zoom-in techniques often sacrifice detail and introduce artifacts. Motivated by the potential of arbitrary-scale super-resolution to naturally address these inherent challenges, we present the Residual Dense Swin Transformer Network (RDSTN), designed to capture the non-local characteristics and long-range dependencies intrinsic to ultrasound images.
- Categories:
- Read more about CROSS-LINGUAL LEARNING IN MULTILINGUAL SCENE TEXT RECOGNITION
- Log in to post comments
In this paper, we investigate cross-lingual learning (CLL) for multilingual scene text recognition (STR). CLL transfers knowledge from one language to another. We aim to find the condition that exploits knowledge from high-resource languages for improving performance in low-resource languages. To do so, we first examine if two general insights about CLL discussed in previous works are applied to multilingual STR: (1) Joint learning with high- and low-resource languages may reduce performance on low-resource languages, and (2) CLL works best between typologically similar languages.
- Categories: