- Read more about Adaptive Adversarial Cross-Entropy Loss for Sharpness-Aware Minimization
- Log in to post comments
Recent advancements in learning algorithms have demonstrated that the sharpness of the loss surface is an effective measure for improving the generalization gap. Building upon this concept, Sharpness-Aware Minimization (SAM) was proposed to enhance model generalization and achieved state-of-the-art performance. SAM consists of two main steps, the weight perturbation step and the weight updating step. However, the perturbation in SAM is determined by only the gradient of the training loss, or cross-entropy loss.
- Categories:
- Read more about Rethinking temporal self-similarity for repetitive action counting
- Log in to post comments
Counting repetitive actions in long untrimmed videos is a challenging task that has many applications such as rehabilitation. State-of-the-art methods predict action counts by first generating a temporal self-similarity matrix (TSM) from the sampled frames and then feeding the matrix to a predictor network. The self-similarity matrix, however, is not an optimal input to a network since it discards too much information from the frame-wise embeddings.
- Categories:
- Read more about Fast Unsupervised Tensor Restoration via Low-rank Deconvolution
- Log in to post comments
Low-rank Deconvolution (LRD) has appeared as a new multi-dimensional representation model that enjoys important efficiency and flexibility properties. In this work we ask ourselves if this analytical model can compete against Deep Learning (DL) frameworks like Deep Image Prior (DIP) or Blind-Spot Networks (BSN) and other classical methods in the task of signal restoration. More specifically, we propose to extend LRD with differential regularization.
- Categories:
- Read more about Fast Unsupervised Tensor Restoration via Low-rank Deconvolution
- Log in to post comments
Low-rank Deconvolution (LRD) has appeared as a new multi-dimensional representation model that enjoys important efficiency and flexibility properties. In this work we ask ourselves if this analytical model can compete against Deep Learning (DL) frameworks like Deep Image Prior (DIP) or Blind-Spot Networks (BSN) and other classical methods in the task of signal restoration. More specifically, we propose to extend LRD with differential regularization.
- Categories:
- Read more about LIGHTWEIGHT UNDERWATER IMAGE ENHANCEMENT VIA IMPULSE RESPONSE OF LOW-PASS FILTER BASED ATTENTION NETWORK
- Log in to post comments
In this paper, we propose an improved model of Shallow-UWnet for underwater image enhancement. In the proposed method, we enhance the learning process and solve the vanishing gradient problem by a skip connection, which concatenates the raw underwater image and the low-pass filter (LPF) impulse response into Shallow-UWnet. Additionally, we integrate the simple, parameter-free attention module (SimAM) into each Convolution Block to enhance the visual quality of images.
- Categories:
- Read more about ET: Explain to Train: Leveraging Explanations to Enhance the Training of A Multimodal Transformer
- Log in to post comments
Explainable Artificial Intelligence (XAI) has become increasingly vital for improving the transparency and reliability of neural network decisions. Transformer architectures have emerged as the state-of-the-art for various tasks across single modalities such as video, language, or signals, as well as for multimodal approaches. Although XAI methods for transformers are available, their potential impact during model training remains underexplored.
- Categories:
- Read more about Driving through Graphs: A Bipartite Graph for Traffic Scene Analysis
- Log in to post comments
We introduce a novel approach for traffic scene analysis in driving videos by exploring spatio-temporal relationships captured by a temporal frame-to-frame (f2f) bipartite graph, eliminating the need for complex image-level high-dimensional feature extraction. Instead, we rely on object detectors that provide bounding box information. The proposed graph approach efficiently connects objects across frames where nodes represent essential object attributes, and edges signify interactions based on simple spatial metrics such as distance and angles between objects.
- Categories:
- Read more about A HUE-PRESERVING CONTRAST ENHANCEMENT METHOD USING HISTOGRAM SPECIFICATION FOR EACH RGB COMPONENT
- Log in to post comments
The image enhancement is one of the most important image processing techniques and is used to improve the quality of the image captured in various situations. In this study, we propose a novel hue preserving contrast enhancement method that realizes the chroma adjustment while suppressing overenhancement. In the proposed method, firstly, the histogram of each RGB component of an original image is smoothed by a Gaussian filter. Then, the histogram specification method is performed using the smoothed histogram of each RGB component to spread the pixel distribution in RGB color space.
- Categories:
- Read more about FUSION OF INDEPENDENT AND INTERACTIVE FEATURES FOR HUMAN-OBJECT INTERACTION DETECTION
- Log in to post comments
Human-Object Interaction (HOI) detection, which aims to identify humans and objects with interactive behaviors in images and predict the behaviors between them, is of great significance for semantic understanding. The existing works primarily focus on exploring the fine-grained semantic features of humans and objects, as well as the spatial relationships between them. However, these methods do not leverage the contextual information within the interaction area, which could potentially be valuable for predicting interaction behavior.
- Categories:
- Read more about City Traffic Aware Multi-Target Tracking Prediction With Multi-Camera
- Log in to post comments
In recent years, Multi-Camera Multiple Object Tracking (MCMT) has gained significant attention as a crucial computer vision application. Research focuses on data association and track detection. However, accurately selecting datasets from raw vision data remains challenging due to real-world complexities like object types, varying speeds, and unknown directions. To address these problems, this paper proposes the Object Tracking Model (OTM) to capture the feature of target area with the Camera Monitoring Network (CMN) based on Graph Convolutional Network (GCN).
- Categories: