Sorry, you need to enable JavaScript to visit this website.

Towards Robust Visual Transformer Networks via K-Sparse Attention

Citation Author(s):
Sajjad Amini, Shahrokh Ghaemmaghami
Submitted by:
Sajjad Amini
Last updated:
10 May 2022 - 5:11am
Document Type:
Presentation Slides
Document Year:
2022
Event:
Presenters:
Sajjad Amini
Paper Code:
MLSP-33.5
 

Transformer networks, originally developed in the community of machine translation to eliminate sequential nature of recurrent neural networks, have shown impressive results in other natural language processing and machine vision tasks. Self-attention is the core module behind visual transformers which globally mixes the image information. This module drastically reduces the intrinsic inductive bias imposed by CNNs, such as locality, while encountering insufficient robustness against some adversarial attacks. In this paper we introduce K-sparse attention to preserve low inductive bias, while robustifying transformers against adversarial attacks. We show that standard transformers attend values with dense set of weights, while the sparse attention, automatically selected by an optimization algorithm, can preserve generalization performance of the transformer and, at the same time, improve its robustness.

up
0 users have voted: