Sorry, you need to enable JavaScript to visit this website.

Feature Pyramid Network (FPN) plays a critical role and is indispensable for object detection methods. In recent years, attention mechanism has been utilized to improve FPN due to its excellent performance. Existing attention-based FPN methods generally work with a complex structure, resulting in an increase of computational costs. In view of this, we propose a novel Channel Self-Attention Guided Feature Pyramid Network (CAG-FPN), which not only has a simple structure but also consistently improves detection accuracy.

Categories:
8 Views

Although current data augmentation methods are successful to alleviate the data insufficiency, conventional augmentation are primarily intra-domain while advanced generative adversarial networks (GANs) generate images remaining uncertain, particularly in small-scale datasets. In this paper, we propose a parameterized GAN (ParaGAN) that effectively controls the changes of synthetic samples among domains and highlights the attention regions for downstream classification.

Categories:
3 Views

Recently, vulnerable samples have been shown to be crucial
for improving adversarial training performance. Our analysis
on existing vulnerable samples mining methods indicate that
existing methods have two problems: 1) valuable connections
among different pairs of natural samples and their adversarial
counterparts are ignored; 2) parts of vulnerable samples are
unconsidered. To better leverage vulnerable samples, we propose INter PAir ConstrainT (INPACT) and Vulnerable Aware

Categories:
3 Views

The multi-view hash method converts heterogeneous data from multiple views into binary hash codes, which is one of the critical technologies in multimedia retrieval. However, the current methods mainly explore the complementarity among multiple views while lacking confidence in learning and fusion. Moreover, in practical application scenarios, the single-view data contains redundant noise. To conduct confidence learning and eliminate unnecessary noise, we propose a novel Adaptive Confidence Multi-View Hashing (ACMVH) method.

Categories:
5 Views

Multi-label image classification presents a challenging task in many domains, including computer vision and medical imaging. Recent advancements have introduced graph-based and transformer-based methods to improve performance and capture label dependencies. However, these methods often include complex modules that entail heavy computation and lack interpretability. In this paper, we propose Probabilistic Multi-label Contrastive Learning (ProbMCL), a novel framework to address these challenges in multi-label image classification tasks.

Categories:
14 Views

Detecting small to tiny targets in infrared images is a challenging task in computer vision, especially when it comes to differentiating these targets from noisy or textured backgrounds. Traditional object detection methods such as YOLO

Categories:
3 Views

The confidence scores of 2D pose estimation are widely utilized in various fields, including multi-view 3D human pose estimation, skeleton-based human tracking, human action recognition, human re-identification, etc. Despite widespread use, confidence scores from 2D pose estimation methods are unreliable in indicating the accuracy of estimation results, particularly in occlusion situations, i.e., keypoints with high confidence scores may have low accuracy and vice versa. To address this issue, we propose a new 2D human pose estimation calibration method in this paper.

Categories:
3 Views

In recent years, the rapid proliferation of multi-modal fake news has posed potential harm across various sectors of society, making the detection of multi-modal fake news crucial. Most existing methods can not effectively reduce the redundant information and preserve both semantic and structural information. To address these problems, this paper proposes a semantic distillation and structural alignment (SDSA) network. We design an semantic distillation module for modality-specific features to preserve task-relevant semantic information and eliminate redundant information.

Categories:
23 Views

Most works in class incremental learning (CIL) assume disjoint sets of classes as tasks. Although a few works deal with overlapped sets of classes, they either assume a balanced data distribution or assume a mild imbalanced distribution. Instead, in this paper, we explore one of the understudied real-world CIL settings where (1) different tasks can share some classes but with new data samples, and (2) the training data of each task follows a long-tail distribution. We call this setting CIL-LT.

Categories:
29 Views

Efficient and accurate bird sound classification is of importance for ecology, habitat protection and scientific research, as it plays a central role in monitoring the distribution and abundance of species. However, prevailing methods typically demand extensively labeled audio datasets and have highly customized frameworks, imposing substantial computational and annotation loads. In this study, we present an efficient and general framework called SSL-Net, which combines spectral and learned features to identify different bird sounds.

Categories:
1 Views

Pages