J-MAE: Jigsaw Meets Masked Autoencoders In X-Ray Security Inspection

The X-ray security inspection aims to identify any restricted items to protect public safety. Due to the lack of focus on unsupervised learning in this field, using pre-trained models on natural images leads to suboptimal results in downstream tasks. Previous works would lose the relative positional relationships during the pre-training process, which is detrimental for X-ray images that lack texture and rely on shape. In this paper, we propose the jigsaw style MAE (J-MAE) to preserve the relative position information by shuffling the position encoding of visible patches. This forces the network to perform semantic reasoning to understand the shape and composition of X-ray objects. Meanwhile, we propose the Incremental Shuffling Module (ISM) and Permute Predicting Module (PPM) to make the training process more stable and accelerate convergence. Our proposed method has consistently outperformed other methods on three downstream X-ray security inspection datasets.

PPT.pptx

PPT.pptx (212)

Thumbs Up

CITE

Documents

Presentation Slides

J-MAE: Jigsaw Meets Masked Autoencoders In X-Ray Security Inspection

PPT.pptx

QUESTIONS?