A Benchmark Study of Backdoor Data Poisoning Defenses for Deep Neural Network Classifiers and A Novel Defense

While data poisoning attacks on classifiers were originally proposed to degrade a classifier's usability, there has been strong recent interest in backdoor data poisoning attacks, where the classifier learns to classify to a target class whenever a backdoor pattern ({\it e.g.}, a watermark or innocuous pattern) is added to an example from some class other than the target class. In this paper, we conduct a benchmark experimental study to assess the effectiveness of backdoor attacks against deep neural network (DNN) classifiers for images (CIFAR-10 domain), as well as of anomaly detection defenses against these attacks, assuming the defender has access to the (poisoned) training set. We also propose a novel defense scheme (cluster impurity (CI)) based on two ideas: i) backdoor patterns may cluster in a DNN's (e.g. penultimate) deep layer latent space; ii) image filtering (or additive noise) may remove the backdoor patterns, and thus alter the class decision produced by the DNN. We demonstrate that largely imperceptible single-pixel backdoor attacks are highly successful, with no effect on classifier usability. However, the CI approach is highly effective at detecting these attacks, and more successful than previous backdoor detection methods.

MLSP19 backdoor poster 1.1.pdf

MLSP19 backdoor poster 1.1.pdf (409)

Thumbs Up

CITE

Documents

Poster

A Benchmark Study of Backdoor Data Poisoning Defenses for Deep Neural Network Classifiers and A Novel Defense

MLSP19 backdoor poster 1.1.pdf

QUESTIONS?