Sorry, you need to enable JavaScript to visit this website.

A Benchmark Study of Backdoor Data Poisoning Defenses for Deep Neural Network Classifiers and A Novel Defense

Citation Author(s):
George Kesidis
Submitted by:
Zhen Xiang
Last updated:
11 October 2019 - 1:35pm
Document Type:
Poster
Document Year:
2019
Event:
Presenters:
Zhen Xiang, David Miller, George Kesidis
Paper Code:
34
 

While data poisoning attacks on classifiers were originally proposed to degrade a classifier's usability, there has been strong recent interest in backdoor data poisoning attacks, where the classifier learns to classify to a target class whenever a backdoor pattern ({\it e.g.}, a watermark or innocuous pattern) is added to an example from some class other than the target class. In this paper, we conduct a benchmark experimental study to assess the effectiveness of backdoor attacks against deep neural network (DNN) classifiers for images (CIFAR-10 domain), as well as of anomaly detection defenses against these attacks, assuming the defender has access to the (poisoned) training set. We also propose a novel defense scheme (cluster impurity (CI)) based on two ideas: i) backdoor patterns may cluster in a DNN's (e.g. penultimate) deep layer latent space; ii) image filtering (or additive noise) may remove the backdoor patterns, and thus alter the class decision produced by the DNN. We demonstrate that largely imperceptible single-pixel backdoor attacks are highly successful, with no effect on classifier usability. However, the CI approach is highly effective at detecting these attacks, and more successful than previous backdoor detection methods.

up
1 user has voted: Zhen Xiang