Computer Vision

PVD4RCV: A Photo-realistic Multi-Distortion Video Dataset for Benchmarking and Developing Robust Computer Vision Models

This work addresses a significant gap in existing
image and video databases commonly used in computer vision
applications by introducing a unique and comprehensive
database named Photo-realistic Multi-Distortion Video Dataset
for Benchmarking and Developing Robust Computer Vision
Models (PVD4RCV). A key innovation of PVD4RCV lies in
its incorporation of some relevant physical factors (e.g. depth
information, interaction of light with scene contents) inherent to
video signal acquisition in constrained and complex real-world

VCIP_PVD4RCV_Beghdadi_2025_compressed.pdf

Dataset for benchmarking and developing object detection and tracking models (89)

Categories:: Image/Video Processing

44 Views

RGB-D tracking of complex shapes using coarse object models

Read more about RGB-D tracking of complex shapes using coarse object models
Log in to post comments

This paper presents a framework for accurately tracking objects of complex shapes with joint minimization of geometric and photometric parameters using a coarse 3D object model with the RGB-D cameras. Tracking with coarse 3D model is remarkably useful for industrial applications. A technique is proposed that uses a combination of point-to-plane distance minimization and photometric error minimization to track objects accurately. The concept of 'keyframes' are used in this system of object tracking for minimizing drift.

Poster_RGBDTracking.pdf

Poster_RGBDTracking.pdf (448)

Categories:: Image/Video Processing

22 Views

A Fully Convolutional Tri-branch Network (FCTN) For Domain Adaptation

Read more about A Fully Convolutional Tri-branch Network (FCTN) For Domain Adaptation
Log in to post comments

A domain adaptation method for urban scene segmentation is proposed in this work. We develop a fully convolutional tri-branch network, where two branches assign pseudo labels to images in the unlabeled target domain while the third branch is trained with supervision based on images in the pseudo-labeled target domain. The re-labeling and re-training processes alternate. With this design, the tri-branch network learns target-specific discriminative representations progressively and, as a result, the cross-domain capability of the segmenter improves.

poster.pdf

A Fully Convolutional Tri-branch Network (FCTN) For Domain Adaptation-Poster (534)

Categories:: Machine Learning for Signal Processing

19 Views

Efficient Segmentation-Aided Text Detection for Intelligent Robots_Slides

Read more about Efficient Segmentation-Aided Text Detection for Intelligent Robots_Slides
Log in to post comments

Scene text detection is a critical prerequisite for many fascinating applications for vision-based intelligent robots. Existing methods detect texts either using the local information only or casting it as a semantic segmentation problem. They tend to produce a large number of false alarms or cannot separate individual words accurately. In this work, we present an elegant segmentation-aided text detection solution that predicts the word-level bounding boxes using an end-to-end trainable deep convolutional neural network.

GlobalSIP17_Oral_Efficient Segmentation-Aided Text Detection.pdf

GlobalSIP17_Oral_Segmentation-aided_Text_Detection (680)

Categories:: Image/Video Processing
Pattern recognition and classification (MLR-PATT)

12 Views

Efficient Segmentation-Aided Text Detection for Intelligent Robots_Poster

Read more about Efficient Segmentation-Aided Text Detection for Intelligent Robots_Poster
Log in to post comments

Poster-Conference.pdf

GlobalSIP2017_Segmentation-aided_Text_Detection (574)

Categories:: Image/Video Processing
Pattern recognition and classification (MLR-PATT)

10 Views

Discriminant Correlation Analysis for Feature Level Fusion with Application to Multimodal Biometrics

In this paper, we present Discriminant Correlation Analysis (DCA), a feature level fusion technique that incorporates the class associations in correlation analysis of the feature sets. DCA performs an effective feature fusion by maximizing the pair-wise correlations across the two feature sets, and at the same time, eliminating the between-class correlations and restricting the correlations to be within classes.

DCA_ICASSP16_Poster.pdf

DCA_ICASSP16_Poster.pdf (1842)

Categories:: Information Forensics and Security
Biometrics
Applications
Multimedia Forensics
Machine Learning for Signal Processing
Applications in Data Fusion (MLR-FUSI)
Pattern recognition and classification (MLR-PATT)
Other applications of machine learning (MLR-APPL)
Image, Video, and Multidimensional Signal Processing
Image/Video Processing
Multimedia Signal Processing
Multimodal signal processing
Sensor Array and Multichannel Signal Processing

121 Views