https://github.com/alan-turing-institute/neural-watchdog
A firewall for your *neural* networks.
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (3.4%) to scientific vocabulary
Repository
A firewall for your *neural* networks.
Basic Info
Statistics
- Stars: 8
- Watchers: 5
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Neural Watchdog is an open-source tool designed to detect stealthy backdoor attacks in Deep Reinforcement Learning (DRL) policies. In this repository, we introduce an evasive backdoor technique called the "in-distribution trigger" and demonstrate how to detect them using our tool.
For a detailed understanding of the technical aspects of Neural Watchdog, please refer to our paper: https://arxiv.org/abs/2407.15168
If you would like to cite our work, please use the following reference:
@inproceedings{vyas2024mitigating,
title={Mitigating Deep Reinforcement Learning Backdoors in the Neural Activation Space},
author={Vyas, Sanyam and Hicks, Chris and Mavroudis, Vasilios},
booktitle={2024 IEEE Security and Privacy Workshops (SPW)},
pages={76--86},
year={2024},
organization={IEEE}
}
Getting Started
Before you begin, ensure that you have Conda installed on your Mac. If you do not have Conda installed, please follow the official Conda installation guide for a Mac.
Installation
Clone the Repository
git clone git@github.com:alan-turing-institute/in-distribution-backdoors.git
Create the conda environment
python
conda env create -f environment.yml
conda activate minigrid_farama_2
Set PYTHONPATH
python
export PYTHONPATH= "/path/to/file/Minigrid"
echo $PYTHONPATH
Execution
Run the visualize file, and edit the "visualize.py" file and "crossings.py" code according to the data you want to collect (Non-triggered/Triggered, Goal in field of view, Trigger in field of view, Thresholding Detector Algorithm)(Trigger on/Non-Trigger off). The visualisation file collects the neural activations for every step and saves them to a file according to the type of data that requires collection,
python
python3 -m scripts.visualize --env MiniGrid-LavaCrossingS9N1-v0 --model DSLP_Crossings_Trigger_60k_256_neurons --episodes 1000
To run the training file from scratch, and edit the "visualize.py" file and "crossings.py" code according to the data you want to collect (Non-triggered/Triggered, Goal in field of view, Trigger in field of view, Thresholding Detector Algorithm)(Trigger on/Non-Trigger off). The train.py file will save all model outputs to the Minigrid/minigrid/torch-ac/rl-starter-files/storage folder. This model can then be accessed in the visualize.py file above
python
python3 -m scripts.train --algo ppo --env MiniGrid-LavaCrossingS9N1-v0 --model model_name --save-interval 10 --frames 60000000
Our Detector in Action
Our in-distribution backdoor trigger here is the convergence of two lava rivers, forming a "+" sign. This tool can be used to detect such backdoor triggers in real-time and prevent the poisoned agent from taking malicious actions i.e., heading into the lava rivers in this context.

Watch the video on Youtube
Atari Breakout Experiments
This repository contains the source code of sanitization backdoor policies for Atari breakout game environment. The backdoor policy in this example has been trained using the environment poisoning framework of TrojDRL paper .
The state space consists of a concatenated image frames. The trigger is a 3x5 image inserted on the tile space of the Atari Breakout Game. The backdoor policy has been trained to a level so that in absense of trigger the policy consistently achieves high score against the oppenent while in presence of trigger it takes 'no move' action eventually achieving a very low score on average.
Setup codebase and python environment.
- install anaconda, follow instructions here.
- create a new environment from the specification file.
conda env create --name NEW_ENV_NAME -f environment.yml - activate conda environment.
conda activate NEW_ENV_NAME
Run the code.
- test backdoor policy in the clean environment :
python driver_parallel.py 'backdoor_in_clean' 'save_states'- change number of trials, number of test episodes(test_count) in the trials if needed.
- the clean states data generated here would be used for sanitization in step 3.
- test backdoor policy in the triggered environment :
python driver_parallel.py 'backdoor_in_triggered' - sanitize backdoor and test sanitized policy in the triggered environment :
python driver_parallel.py 'sanitized_in_triggered'- construct sanitized policies for various number of clean sample sets and then test it.
- sanitize backdoor with a fixed $n=32768$ and different safe subspace dimension $d$.
python driver_parallel.py 'sanitized_with_fixed_n'- to run this part, we need to have bases for $n=32768$ samples obtained from step 3.
Training the backdoor policy from scratch.
- We train a strongly targeted backdoor policy that uses a and takes 'no move' action when the trigger is active as specfied in the TrojDRL paper. For more details please refer to this paper and the code.
- To train this backdoor policy run :
python3 train.py --game=breakout --debugging_folder=pretrained_backdoor/strong_targeted/breakout_target_noop/ --poison --color=5 --attack_method=strong_targeted --pixels_to_poison_h=5 --pixels_to_poison_v=3 --start_position="29,28" --when_to_poison="uniformly" --action=2 --budget=20000 --device='/cpu:0' --emulator_counts=12 --emulator_workers=4#### Results
Our results show that our in-distribution trigger successfully evades the defence algorihtm of Bharti et al's NeurIPS solution paper
Edited Files
The evaluator.py file contains the code which changes the size of the trigger along with the params_indist.yml file. The latter file adjusts the default size along with the colour of the trigger
The plot_graphs.py file saves the visualisation found in figure 2 of the paper, whilst the analyse_performance_for_n=32768_sanitization.py file saves the visualisation found in figure 3 of the paper
Owner
- Name: The Alan Turing Institute
- Login: alan-turing-institute
- Kind: organization
- Email: info@turing.ac.uk
- Website: https://turing.ac.uk
- Repositories: 477
- Profile: https://github.com/alan-turing-institute
The UK's national institute for data science and artificial intelligence.
GitHub Events
Total
- Watch event: 6
Last Year
- Watch event: 6
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- gymnasium >=0.28.1
- imageio >=2.31.1
- matplotlib >=3.0
- networkx *
- numpy >=1.18.0
- pygame >=2.4.0
- pytest >=7.0.1
- pytest-mock >=3.10.0
- gymnasium >=0.28.1
- numpy >=1.18.0
- pygame >=2.4.0
- gymnasium >=0.26
- numpy >=1.18.0
- pygame >=2.2.0
- pytest >=7.0.1 test
- pytest-mock >=3.10.0 test
- ale-py ==0.7.4
- imageio ==2.16.0
- importlib-metadata ==4.11.1
- importlib-resources ==5.4.0
- joblib ==1.1.0
- munch ==2.5.0
- numpy *
- opencv-python ==4.5.5.62
- pandas ==1.3.5
- pillow ==9.0.1
- pip ==22.0.3
- pytz ==2021.3
- seaborn ==0.11.2
- setuptools ==60.9.3
- typing-extensions ==4.1.1
- wheel ==0.37.1
- zipp ==3.7.0