https://github.com/alan-turing-institute/neural-watchdog

A firewall for your *neural* networks.

https://github.com/alan-turing-institute/neural-watchdog

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (3.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

A firewall for your *neural* networks.

Basic Info
  • Host: GitHub
  • Owner: alan-turing-institute
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 13 MB
Statistics
  • Stars: 8
  • Watchers: 5
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License

README.md

License: MIT

logo-black-bh

Neural Watchdog is an open-source tool designed to detect stealthy backdoor attacks in Deep Reinforcement Learning (DRL) policies. In this repository, we introduce an evasive backdoor technique called the "in-distribution trigger" and demonstrate how to detect them using our tool.

For a detailed understanding of the technical aspects of Neural Watchdog, please refer to our paper: https://arxiv.org/abs/2407.15168

If you would like to cite our work, please use the following reference: @inproceedings{vyas2024mitigating, title={Mitigating Deep Reinforcement Learning Backdoors in the Neural Activation Space}, author={Vyas, Sanyam and Hicks, Chris and Mavroudis, Vasilios}, booktitle={2024 IEEE Security and Privacy Workshops (SPW)}, pages={76--86}, year={2024}, organization={IEEE} }

Getting Started

Before you begin, ensure that you have Conda installed on your Mac. If you do not have Conda installed, please follow the official Conda installation guide for a Mac.

Installation

Clone the Repository git clone git@github.com:alan-turing-institute/in-distribution-backdoors.git Create the conda environment python conda env create -f environment.yml conda activate minigrid_farama_2 Set PYTHONPATH python export PYTHONPATH= "/path/to/file/Minigrid" echo $PYTHONPATH

Execution

Run the visualize file, and edit the "visualize.py" file and "crossings.py" code according to the data you want to collect (Non-triggered/Triggered, Goal in field of view, Trigger in field of view, Thresholding Detector Algorithm)(Trigger on/Non-Trigger off). The visualisation file collects the neural activations for every step and saves them to a file according to the type of data that requires collection, python python3 -m scripts.visualize --env MiniGrid-LavaCrossingS9N1-v0 --model DSLP_Crossings_Trigger_60k_256_neurons --episodes 1000

To run the training file from scratch, and edit the "visualize.py" file and "crossings.py" code according to the data you want to collect (Non-triggered/Triggered, Goal in field of view, Trigger in field of view, Thresholding Detector Algorithm)(Trigger on/Non-Trigger off). The train.py file will save all model outputs to the Minigrid/minigrid/torch-ac/rl-starter-files/storage folder. This model can then be accessed in the visualize.py file above python python3 -m scripts.train --algo ppo --env MiniGrid-LavaCrossingS9N1-v0 --model model_name --save-interval 10 --frames 60000000

Our Detector in Action

Our in-distribution backdoor trigger here is the convergence of two lava rivers, forming a "+" sign. This tool can be used to detect such backdoor triggers in real-time and prevent the poisoned agent from taking malicious actions i.e., heading into the lava rivers in this context.

Watch the video on Youtube

Atari Breakout Experiments

This repository contains the source code of sanitization backdoor policies for Atari breakout game environment. The backdoor policy in this example has been trained using the environment poisoning framework of TrojDRL paper .

The state space consists of a concatenated image frames. The trigger is a 3x5 image inserted on the tile space of the Atari Breakout Game. The backdoor policy has been trained to a level so that in absense of trigger the policy consistently achieves high score against the oppenent while in presence of trigger it takes 'no move' action eventually achieving a very low score on average.

Setup codebase and python environment.

  1. install anaconda, follow instructions here.
  2. create a new environment from the specification file. conda env create --name NEW_ENV_NAME -f environment.yml
  3. activate conda environment. conda activate NEW_ENV_NAME

Run the code.

  1. test backdoor policy in the clean environment :
    python driver_parallel.py 'backdoor_in_clean' 'save_states'
    • change number of trials, number of test episodes(test_count) in the trials if needed.
    • the clean states data generated here would be used for sanitization in step 3.
  2. test backdoor policy in the triggered environment :
    python driver_parallel.py 'backdoor_in_triggered'
  3. sanitize backdoor and test sanitized policy in the triggered environment :
    python driver_parallel.py 'sanitized_in_triggered'
    • construct sanitized policies for various number of clean sample sets and then test it.
  4. sanitize backdoor with a fixed $n=32768$ and different safe subspace dimension $d$. python driver_parallel.py 'sanitized_with_fixed_n'
    • to run this part, we need to have bases for $n=32768$ samples obtained from step 3.

Training the backdoor policy from scratch.

  • We train a strongly targeted backdoor policy that uses a and takes 'no move' action when the trigger is active as specfied in the TrojDRL paper. For more details please refer to this paper and the code.
  • To train this backdoor policy run : python3 train.py --game=breakout --debugging_folder=pretrained_backdoor/strong_targeted/breakout_target_noop/ --poison --color=5 --attack_method=strong_targeted --pixels_to_poison_h=5 --pixels_to_poison_v=3 --start_position="29,28" --when_to_poison="uniformly" --action=2 --budget=20000 --device='/cpu:0' --emulator_counts=12 --emulator_workers=4 #### Results

Our results show that our in-distribution trigger successfully evades the defence algorihtm of Bharti et al's NeurIPS solution paper

performance_breakout.pdf

spectrumsafesubspace.pdf

Edited Files

The evaluator.py file contains the code which changes the size of the trigger along with the params_indist.yml file. The latter file adjusts the default size along with the colour of the trigger The plot_graphs.py file saves the visualisation found in figure 2 of the paper, whilst the analyse_performance_for_n=32768_sanitization.py file saves the visualisation found in figure 3 of the paper

Owner

  • Name: The Alan Turing Institute
  • Login: alan-turing-institute
  • Kind: organization
  • Email: info@turing.ac.uk

The UK's national institute for data science and artificial intelligence.

GitHub Events

Total
  • Watch event: 6
Last Year
  • Watch event: 6

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

Minigrid/minigrid.egg-info/requires.txt pypi
  • gymnasium >=0.28.1
  • imageio >=2.31.1
  • matplotlib >=3.0
  • networkx *
  • numpy >=1.18.0
  • pygame >=2.4.0
  • pytest >=7.0.1
  • pytest-mock >=3.10.0
Minigrid/pyproject.toml pypi
  • gymnasium >=0.28.1
  • numpy >=1.18.0
  • pygame >=2.4.0
Minigrid/requirements.txt pypi
  • gymnasium >=0.26
  • numpy >=1.18.0
  • pygame >=2.2.0
Minigrid/setup.py pypi
Minigrid/test_requirements.txt pypi
  • pytest >=7.0.1 test
  • pytest-mock >=3.10.0 test
environment.yml pypi
  • ale-py ==0.7.4
  • imageio ==2.16.0
  • importlib-metadata ==4.11.1
  • importlib-resources ==5.4.0
  • joblib ==1.1.0
  • munch ==2.5.0
  • numpy *
  • opencv-python ==4.5.5.62
  • pandas ==1.3.5
  • pillow ==9.0.1
  • pip ==22.0.3
  • pytz ==2021.3
  • seaborn ==0.11.2
  • setuptools ==60.9.3
  • typing-extensions ==4.1.1
  • wheel ==0.37.1
  • zipp ==3.7.0