ICIP 2023

IEEE ICIP 2023 - The International Conference on Image Processing (ICIP), sponsored by the IEEE Signal Processing Society, is the premier forum for the presentation of technological advances and research results in the fields of theoretical, experimental, and applied image and video processing. ICIP has been held annually since 1994, brings together leading engineers and scientists in image and video processing from around the world. Visit website.

MULTI-EXIT VISION TRANSFORMER WITH CUSTOM FINE-TUNING FOR FINE-GRAINED IMAGE RECOGNITION

Capturing subtle visual differences between subordinate categories is crucial for improving the performance of Finegrained Visual Classification (FGVC). Recent works proposed deep learning models based on Vision Transformer (ViT) to take advantage of its self-attention mechanism to locate important regions of the objects and extract global information. However, their large number of layers with self-attention mechanism requires intensive computational cost and makes them impractical to be deployed on resource-restricted hardware including internet of things (IoT) devices.

eposter_2852_ICIP2023.pptx

poster (196)

icip video presentation slides.pptx

presentation slides (202)

ICIP_LaTeX_Template.pdf

paper (252)

Categories:: Other

57 Views

SCENE TEXT RECOGNITION MODELS EXPLAINABILITY USING LOCAL FEATURES

Read more about SCENE TEXT RECOGNITION MODELS EXPLAINABILITY USING LOCAL FEATURES
Log in to post comments

Explainable AI (XAI) is the study on how humans can be able to understand the cause of a model’s prediction. In this work, the problem of interest is Scene Text Recognition (STR) Explainability, using XAI to understand the cause of an STR model’s prediction. Recent XAI literatures on STR only provide a simple analysis and do not fully explore other XAI

SceneTextRecognitionPaper.pdf

SceneTextRecognitionPaper.pdf (172)

Categories:: Pattern recognition and classification (MLR-PATT)

23 Views

SELECTING A DIVERSE SET OF AESTHETICALLY-PLEASING AND REPRESENTATIVE VIDEO THUMBNAILS USING REINFORCEMENT LEARNING

This paper presents a new reinforcement-based method for video thumbnail selection (called RL-DiVTS), that relies on estimates of the aesthetic quality, representativeness and visual diversity of a small set of selected frames, made with

2023-ICIP-RL-DiVTS_poster.pdf

Poster presenting our RL-DiVTS method for video thumbnail selection (paper published in IEEE ICIP 2023) (151)

Categories:: Image/Video Processing

19 Views

WHEN VISIBLE-TO-THERMAL FACIAL GAN BEATS CONDITIONAL DIFFUSION

Read more about WHEN VISIBLE-TO-THERMAL FACIAL GAN BEATS CONDITIONAL DIFFUSION
Log in to post comments

Thermal facial imagery offers valuable insight into physiological states such as inflammation and stress by detecting emitted radiation in the infrared spectrum, which is unseen in the visible spectra. Telemedicine applications could benefit from thermal imagery, but conventional computers are reliant on RGB cameras and lack thermal sensors. As a result, we propose the Visible-to-Thermal Facial GAN (VTF-GAN) that is specifically designed to generate high-resolution thermal faces by learning both the spatial and frequency domains of facial regions, across spectra.

ICIP_2023_Presentation_Live.pptx

ICIP_2023_Presentation_Live.pptx (185)

Categories:: Image/Video Processing
Bio Imaging and Signal Processing

34 Views

UTILIZING SUPER-RESOLUTION FOR ENHANCED AUTOMOTIVE RADAR OBJECT DETECTION

Read more about UTILIZING SUPER-RESOLUTION FOR ENHANCED AUTOMOTIVE RADAR OBJECT DETECTION
Log in to post comments

In recent years, automotive radar has become an integral part of the advanced safety sensor stack. Although radar gives a significant advantage over a camera or Lidar, it suffers from poor angular resolution, unwanted noises and significant object smearing across the angular bins, making radar-based object detection challenging. We propose a novel radar-based object detection utilizing a deep learning-based super-resolution (DLSR) model. Due to the unavailability of low-high resolution radar data pair, we first simulate the data to train a DLSR model.

ICIP_2023_Presentation_3444.pdf

Presentation slides (175)

Categories:: Other

42 Views

A Multichannel Localization Method for Camouflaged Object Detection

Read more about A Multichannel Localization Method for Camouflaged Object Detection
Log in to post comments

This paper proposes a multichannel method for discriminative region localization in Camouflaged Object Detection (COD) tasks. In one channel, processing the phase and amplitude of 2-D Fourier spectra generate a modified form of the original image, used later for a pixel-wise optimal local entropy analysis. The other channel implements a class activation map (CAM) and Global Average Pooling (GAP) for object localization. We combine the channels linearly to form the final localized version of the COD images.

A Multichannel Localization Method for Camouflaged Object_paper.pdf

Paper (260)

A Multichannel Localization Method for Camouflaged Object.pdf

Presentation Slide (193)

Categories:: Other

99 Views

A Multichannel Localization Method for Camouflaged Object Detection

Read more about A Multichannel Localization Method for Camouflaged Object Detection
Log in to post comments

A Multichannel Localization Method for Camouflaged Object.pdf

A Multichannel Localization Method for Camouflaged Object.pdf (191)

Categories:: Other

17 Views

Optical Character Recognition for Medical Records Digitization with Deep Learning

Read more about Optical Character Recognition for Medical Records Digitization with Deep Learning
1 comment
Log in to post comments

The importance of document digitization has increased due to recent technological advancements, including in the medical field. Digitization of medical records plays a vital role in the healthcare sector as it helps expedite emergency treatment. Due to the scarcity of published studies and public German textual resources, a medical records database with German handwriting was collected and digitized.

IEEE ICIP.pdf

IEEE ICIP.pdf (197)

Categories:: Machine Learning for Signal Processing

52 Views

SiamCLIM: Text-Based Pedestrian Search via Multi-modal Siamese Contrastive Learning

Read more about SiamCLIM: Text-Based Pedestrian Search via Multi-modal Siamese Contrastive Learning
Log in to post comments

Text-based pedestrian search (TBPS) aims at retrieving target persons from the image gallery through descriptive text queries. Despite remarkable progress in recent state-of-the-art approaches, previous works still struggle to efficiently extract discriminative features from multi-modal data. To address the problem of cross-modal fine-grained text-to-image, we proposed a novel Siamese Contrastive Language-Image Model (SiamCLIM).

SiamCLIM.pptx

SiamCLIM.pptx (186)

Categories:: Image/Video Storage, Retrieval

18 Views

Early Detection of Cars Exiting Road-side Parking

Read more about Early Detection of Cars Exiting Road-side Parking
Log in to post comments

Vehicles suddenly exiting road-side parking constitute a hazardous situation for vehicle drivers as well as for Connected and Autonomous Vehicles (CAV). In order to improve the awareness of road users, we propose an original cooperative information system based on image processing to monitor vehicles parked on the road-side and on communication for sending early warning to vehicles on the road about vehicles leaving their parking space.

pres_icip23.pdf

Oral presentation at ICIP 2023 (183)

Categories:: Image/Video Processing

41 Views

Pages