Sorry, you need to enable JavaScript to visit this website.

J-MAE: Jigsaw Meets Masked Autoencoders In X-Ray Security Inspection

DOI:
10.60864/rzhm-8t08
Citation Author(s):
Submitted by:
Weichen Xu
Last updated:
6 June 2024 - 10:28am
Document Type:
Presentation Slides
 

The X-ray security inspection aims to identify any restricted items to protect public safety. Due to the lack of focus on unsupervised learning in this field, using pre-trained models on natural images leads to suboptimal results in downstream tasks. Previous works would lose the relative positional relationships during the pre-training process, which is detrimental for X-ray images that lack texture and rely on shape. In this paper, we propose the jigsaw style MAE (J-MAE) to preserve the relative position information by shuffling the position encoding of visible patches. This forces the network to perform semantic reasoning to understand the shape and composition of X-ray objects. Meanwhile, we propose the Incremental Shuffling Module (ISM) and Permute Predicting Module (PPM) to make the training process more stable and accelerate convergence. Our proposed method has consistently outperformed other methods on three downstream X-ray security inspection datasets.

up
0 users have voted: