Inducing Inductive Bias in Vision Transformer for EEG Classification

Human brain signals are highly complex and dynamic in nature. Electroencephalogram (EEG) devices capture some of this complexity, both in space and in time, with a certain resolution. Recently, transformer-based models have been explored in various applications with different modalities of data. In this work, we introduce a transformer-based model for the classification of EEG signals, inspired by the recent success of the Vision Transformer (ViT) in image classification. Driven by the distinctive characteristics of the EEG data, we design a module that enables us to (1) extract spatio-temporal tokens inherent in the EEG signals and (2) integrate additional non-linearities to capture intricate and non-linear patterns in EEG signals. To that end, we introduce a new lightweight architectural component that combines our proposed attention model with convolution. This convolutional tokenization module forms the basis of our vision backbone referred to as Brain Signal Vision Transformer (BSVT). This architecture takes into account the spatial and temporal features in EEG datasets, leading to token embeddings that effectively capture the fusion of spatial-temporal information. Moreover, while transformer-based models typically perform well when provided with large datasets, here we show that our combination of the inherent inductive bias of Convolutional Neural Networks (CNN) with the transformer enables efficient training from scratch using relatively small datasets, with as few as 0.75M parameters. On the publicly available EEG dataset from Temple University Hospital (TUH Abnormal), our model achieves results comparable or superior to its counterpart ViT model with patchify stem.

ICASSP_2024.pdf

ICASSP_2024.pdf (164)

Thumbs Up

CITE

Documents

Presentation Slides

Inducing Inductive Bias in Vision Transformer for EEG Classification

ICASSP_2024.pdf

QUESTIONS?