https://github.com/boyizhao/fat-reproducing

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (6.1%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: BoyiZhao
Language: Python
Default Branch: master
Size: 34.9 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created about 2 years ago · Last pushed about 2 years ago

Metadata Files

Readme

Friendly Adversarial Training Code

This repository provides codes for friendly adversarial training (FAT).

ICML 2020 Paper: Attacks Which Do Not Kill Training Make Adversarial Learning Stronger (https://arxiv.org/abs/2002.11242) Jingfeng Zhang*, Xilie Xu*, Bo Han, Gang Niu, Lizhen Cui, Masashi Sugiyama and Mohan Kankanhalli

What is the nature of the adversarial training?

Adversarial data can easily fool the standard trained classifier. Adversarial training employs the adversarial data into the training process. Adversarial training aims to achieve two purposes (a) correctly classify the data, and (b) make the decision boundary thick so that no data fall inside the decision boundary.

The purposes of the adversarial training

Conventional formulation of the adversarial training

Conventional adversarial training is based on the minimax formulation:

$min_{finmathcal{F}}frac{1}{n}sum_{i=1}^nell(f(tilde{x}_i),y_i),$

where

$tilde{x}_i=mathrm{argmax}_{tilde{x}inmathcal{B}_epsilon[x_i]}ell(f(tilde{x}),y_i).$

Inside, there is maximization where we find the most adversarial data. Outside, there is minimization where we find a classifier to fit those generated adversarial data.

The minimax formulation is pessimistic.

The minimax-based adversarial training causes the severe degradation of the natural generalization. Why? The minimax-based adversarial training has a severe cross-over mixture problem: the adversarial data of different classes overshoot into the peer areas. Learning from those adversarial data is very difficult.

Cross-over mixture problem of the minimax-based adversarial training

Our min-min formulation for the adversarial training.

The outer minimization keeps the same. Instead of generating adversarial data via the inner maximization, we generate the friendly adversarial data minimizing the loss value. There are two constraints (a) the adversarial data is misclassified, and (b) the wrong prediction of the adversarial data is better than the desired prediction by at least a margin $rho.$

$tilde{x}_i=mathrm{argmin}_{tilde{x}inmathcal{B}_epsilon[x_i]}ell(f(tilde{x}),y_i)quadmathrm{s.t.}quadell(f(tilde{x}),y_i)-min_{yinmathcal{Y}}ell(f(tilde{x}),y)gerho$

Let us look at comparisons between minimax formulation and min-min formulation.

Comparisons between minimax formulation and min-min formulation

A Realization of the Min-min Formulation --- Friendly Adversarial Training (FAT)

Friendly adversarial training (FAT) employs the friendly adversarial data generated by early stopped PGD to update the model. The early stopped PGD stop the PGD interations once the adversarial data is misclassified. (Controlled by the hyperparameter tau in the code. Noted that when tau equal to maximum perturbation step num_steps, our FAT makes the conventional adversarial training e.g., AT, TRADES, and MART as our special cases.)

Conventional adversarial training employs PGD for searching most adversarial data. Friendly adversarial training employs early stopped PGD for searching friendly adversarial data.

Preferred Prerequisites

Python (3.6)
Pytorch (1.2.0)
CUDA
numpy

Running FAT, FAT for TRADES, FAT for MART on benchmark datasets (CIFAR-10 and SVHN)

Here are examples: * Train WRN-32-10 model on CIFAR-10 and compare our results with AT, CAT and DAT: bash CUDA_VISIBLE_DEVICES='0' python FAT.py --epsilon 0.031 CUDA_VISIBLE_DEVICES='0' python FAT.py --epsilon 0.062

White-box evaluations on WRN-32-10

| Defense | Natural Acc. | FGSM Acc. | PGD-20 Acc. | C&W Acc. | |-----------------------|-----------------------|------------------|-----------------|-----------------| |AT(Madry) | 87.30% | 56.10% | 45.80% | 46.80% | CAT | 77.43% | 57.17% | 46.06% | 42.28% | DAT | 85.03% | 63.53% | 48.70% | 47.27% | FAT ( $epsilon=8/255$ ) | 89.34 $pm$ 0.221% |65.52 $pm$ 0.355%| 46.13 $pm$ 0.049%| 46.82 $pm$ 0.517% | FAT ( $epsilon=16/255$ ) | 87.00 $pm$ 0.203%| 65.94 $pm$ 0.244%|49.86 $pm$ 0.328%|48.65 $pm$ 0.176%

Results of AT(Madry), CAT and DAT are reported in DAT. FAT has the same evaluations.

Train WRN-34-10 model on CIFAR-10 and compare our results with TRADES, and MART. bash CUDA_VISIBLE_DEVICES='0' python FAT_for_TRADES.py --epsilon 0.031 CUDA_VISIBLE_DEVICES='0' python FAT_for_TRADES.py --epsilon 0.062 CUDA_VISIBLE_DEVICES='0' python FAT_for_MART.py --epsilon 0.031 CUDA_VISIBLE_DEVICES='0' python FAT_for_MART.py --epsilon 0.062

White-box evaluations on WRN-34-10

| Defense | Natural Acc. | FGSM Acc. | PGD-20 Acc. | C&W Acc. | |-----------------------|-----------------------|------------------|-----------------|-----------------| |TRADES( $beta=1.0$ )| 88.64% | 56.38% | 49.14% | - |FAT for TRADES( $beta=1.0,epsilon=8/255$ )| 89.94 $pm$ 0.303% |61.00 $pm$ 0.418% |49.70 $pm$ 0.653%|49.35 $pm$ 0.363% |TRADES( $beta=6.0$ )|84.92%|61.06%|56.61%|54.47% |FAT for TRADES( $beta=6.0,epsilon=8/255$ )| 86.60 $pm$ 0.548% |61.79 $pm$ 0.570% |55.98 $pm$ 0.209%|54.29 $pm$ 0.173% |FAT for TRADES( $beta=6.0,epsilon=16/255$ )| 84.39 $pm$ 0.030% |61.73 $pm$ 0.131% |57.12 $pm$ 0.233%|54.36 $pm$ 0.177%

Results of TRADES ( $beta=1.0$ and $beta=6.0$ ) are reported in TRADES. FAT for TRADES has the same evaluations. Noted that our evaluations of the above are the same as the description in the TRADES's paper, i.e., adversarial data are generated without random start rand_init=False. However, in TRADES’s GitHub, they use random start rand_init=True before PGD perturbation that is deviated from the statements in their paper. For the fair evaluations of FAT with random start, please refer to the Table 3 in our paper.

How to recover original AT, TRADES, or MART?

Just set tau=10, i.e., python FAT.py --epsilon 0.031 --tau 10 --dynamictau False python FAT_for_TRADES --epsilon 0.031 --tau 10 --dynamictau False python FAT_for_MART.py --epsilon 0.031 --tau 10 --dynamictau False

Want to attack FAT? Sure!

We welcome various attack methods to attack our defense models. For cifar-10 dataset, we normalize all images into [0,1].

Download our pretrained models into the folder FAT_models through this Google Drive link or Baidu Drive link(extraction code: ww7f). bash cd Friendly-Adversarial-Training mkdir FAT_models Run robustness evaluations. bash chmod +x attack_test.sh ./attack_test.sh

Reference

@inproceedings{zhang2020fat, title={Attacks Which Do Not Kill Training Make Adversarial Learning Stronger}, author={Zhang, Jingfeng and Xu, Xilie and Han, Bo and Niu, Gang and Cui, Lizhen and Sugiyama, Masashi and Kankanhalli, Mohan}, booktitle = {ICML}, year={2020} }

Contact

Please contact jingfeng.zhang@auckland.ac.nz (preferred) OR jingfeng.zhang9660@gmail.com and xuxilie@comp.nus.edu.sg if you have any question on the codes.

Owner

Name: Boyi Zhao
Login: BoyiZhao
Kind: user
Location: ChongQing
Company: Southwest University

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science