entity-based-yawning-titan

Official Repository of the Entity-based Reinforcement Learning for Autonomous Cyber Defence paper.

https://github.com/alan-turing-institute/entity-based-yawning-titan

Science Score: 62.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
    Organization alan-turing-institute has institutional domain (turing.ac.uk)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.2%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Official Repository of the Entity-based Reinforcement Learning for Autonomous Cyber Defence paper.

Basic Info
  • Host: GitHub
  • Owner: alan-turing-institute
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 2.87 MB
Statistics
  • Stars: 18
  • Watchers: 3
  • Forks: 0
  • Open Issues: 2
  • Releases: 0
Created over 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

Entity-Based Yawning Titan

Entity-Based Yawning Titan

This repository contains an adaptation of the Yawning Titan cyber simulator, as released by Dstl, to the Entity Gym interface. This adaptation enables entity-based reinforcement learning with Yawning Titan, allowing for natural generalisability of reinforcement learning agents across varying network topologies. Developed at the AI for Cyberdefence Research Centre (AICD).

The repository provides training and evaluation scripts that can reproduce the experiments described in the workshop paper Entity-based Reinforcement Learning for Autonomous Cyber Defence. This paper introduces and motivates the use of entity-based reinforcement learning in the context of autonomous cyber defence.

The training scripts for entity-based reinforcement learning leverage the RogueNet Transformer policy implementation, which is part of the Entity-Neural-Network ecosystem alongside Entity Gym. This policy is trained using the Entity-Neural-Net Trainer package and its implementation of Proximal Policy Optimization (PPO).

For comparison, baseline training scripts for non-entity-based reinforcement learning utilise the Stable Baselines 3 PPO trainer with a multilayer perceptron policy parameterisation. These baselines are trained on an equivalent Yawning Titan environment that retains the OpenAI Gym interface.

If you use this repository in your research, please consider citing our companion paper:

bibtex @article{SymesThompson2024EntityACD, title={Entity-based Reinforcement Learning for Autonomous Cyber Defence}, author={Symes Thompson, Isaac and Caron, Alberto and Hicks, Chris and Mavroudis, Vasilios}, journal={arXiv preprint arXiv:2410.17647v3}, year={2024} }

Please also cite the original Yawning Titan work, with citation specified in the repository here. If using the Entity Gym package, consider citing their repository or the original blog post by the authors, as we do in the paper.

Setup

Docker

This repository contains a Dockerfile and docker-compose.yaml file for use with Docker compose. This might be the simplest way to get the scripts working.

  1. First, clone the repository, then navigate to the root directory.
  2. Make sure Docker is running, and then depending on your Docker/ Docker Compose version. Run: bash docker-compose run --rm entity-yt Or: bash docker compose run --rm -it entity-yt

This should build and open terminal access to a docker container, with a volume mounted to the project directory, with all relevant packages installed.

NB: A dependency of the Entity-Neural-Net Trainer is the pyron library, for which wheels are provided only for x86 architectures. Therefore it is necessary to use emulation if running on non-x86 (e.g Arm Apple Silicon). This is specified on the docker-compose file.

Manual Installation

This is tested with python 3.9 on Ubuntu 22.04, CPU only. With models of default sizes, initial experiments with GPU acceleration were found to hamper performance.

Step 1: Install Gym 0.21.0

Installing Gym 0.21.0 using pip requires specific versions of setuptools and wheel, otherwise installation is likely to fail. bash pip install pip==24.0 bash pip install setuptools==66 wheel==0.38.4 bash pip install gym==0.21.0

Step 2: Install Entity Neural Network dependencies

Install typing_extensions, hyperstate, and enn_trainer with the following command:

bash pip install typing_extensions hyperstate enn_trainer

Step 3: Install PyTorch and Torch scatter

For a CPU-only version of PyTorch, run:

bash pip install torch==1.12.0 --extra-index-url https://download.pytorch.org/whl/cpu bash pip install --no-cache-dir --no-index torch-scatter -f https://data.pyg.org/whl/torch-1.12.0+cpu.html

Step 4: Install Yawning Titan and dependencies

bash pip install stable_baselines3==1.6.2 wandb platformdirs networkx tinydb tabulate Make sure you are in the parent directory, and installing the Yawning Titan version present in this repository. bash pip install -e yawning_titan

Step 5: Install ebyt as a package

bash pip install -e .

Weights & Biases setup

Logging with Weights & Biases is supported. When training an entity-based agent, this involves modifying the training config file (in src/ebyt/configs/yawning_titan/train_config/EntitYT ... .ron) to add a wandb user and project, and enabling tracking. For example: rust TrainConfig( ... track: true, wandb_project_name: "EntityYT" wandb_entity: "John_Smith" ... ) For logging the training of the Stable Baselines 3 agents, you must modify the arguments of the wandb.init inside the Sb3_PPO_YT_train.py script: python wandb.init( ... project='EntityYT', entity='John_Smith', sync_tensorboard=True, )

For logging the evaluation of either kind of agent, you must modify the wandb.init call in the relevant evaluation script. python eval_config.capture_videos = False wandb.init( project='EntityYT', entity='John_Smith', ... )

Overview of Entity-based Yawning Titan environment

The EntityYT environment class inside src/ebyt/envs/entity_cyber/entity_yawning_titan_generic.py contains the Entity-based version of the Yawning Titan environment. It is derived from the GenericNetworkEnv environment, which can be found in yawning_titan/src/yawning_titan/envs/generic/generic_env.py for reference.

Broadly speaking, the main difference between an 'entity-based' approach and the approach used in the Yawning Titan environment is that the observation and action spaces are treated as collections of distinct objects or entities (nodes), whereas in the default Gym environment all features are concatenated into a unified observation vector.

For example, the observation space is defined as: python def obs_space(self) -> ObsSpace: return ObsSpace( entities={ 'Generic_Node': Entity(features=self.feature_list), 'Defender': Entity(features=[]), } ) This specifies the different possible 'entity types' in the environment, as well as a list of named features that entities of that type might have. In our case, we have a single generic entity type for all the nodes in the environment, with the feature list built from all those that are enabled in the blue agent observation space in the game mode config file. We also have a dummy 'Defender' entity type with no specified features - this is to allow for a composite action space where the blue agent first chooses an action type, and then a node to act on. As shown: python def action_space(self) -> Dict[ActionName, ActionSpace]: return { 'High_Level': GlobalCategoricalActionSpace(self.action_list), 'Target': SelectEntityActionSpace(), } Here a 'GlobalCategoricalActionSpace' is defined, containing all of the types of actions the blue agent may perform. This list is built based on the game mode configuration file. This is similar to the standard discrete action space found in Gym. A 'SelectEntityActionSpace' is also specified, which is used for selecting a particular node to execute the action that has been selected by the 'Global' action space.

Note that both the observation and action space provide only the types of possible observations and actions, with the specific compositions potentially varying with each timestep or between episodes. This allows for the initialisation of a policy network with the environment whilst still being flexible to environment variation.

The actual observations and actions are made available to an agent in an Observation object at a particular timestep, upon calling the observe function. python def observe(self, done=False, reward=0) -> Observation: return Observation( entities={ 'Generic_Node': ( self.get_node_features(), [('Generic_Node', i) for i in range(len(self.nodes))], ), 'Defender': (np.zeros([1, 0], dtype=np.float32), [('Defender', 0)]), }, actions={ 'High_Level': GlobalCategoricalActionMask(), 'Target': SelectEntityActionMask( actor_types=['Defender'], actee_types=['Generic_Node'] ), }, done=done, reward=reward, ) Here, a potentially variable length list of node features is provided to the agent, returned by the get_node_features function. The length of this list depends on the number of nodes in the environment. The defender entity is given a dummy feature vector.

The actions available to the agent are defined by action masks over the previously specified action spaces. There is no action mask on the high level action type space. The action mask on the SelectEntityActionSpace determines that the 'Defender' entity is to act on the 'Generic_Node' entities. This prompts the policy network to use information in the embedding of the Defender entity to decide which node to execute the defensive action on. If there were multiple Defender entities then an action would be chosen for each Defender, since the mask is based on entity types rather than specific entities.

Overview of Scripts

All scripts are present in the src/ebyt/train_eval/ directory.

1. EntityYT_train.py

This script facilitates training using the EntityYT entity-based yawning titan environment. It trains a version of PPO

Arguments:

  • "--n-nodes" Specifies the number of nodes to be used in the Yawning Titan network environment. Default 10
  • "--edge-prob" Specifies the edge probability parameter when generating the network (using Erdős-Renyi model). Default 0.1
  • "--episode-length" Specifies the length of each episode. Default 100
  • "--red-skill" Specifies the 'skill' of the red agent in Yawning Titan. Between 0.0 and 1.0. Default 0.7
  • "--random" Specifies whether the network environment is re-generated between episode resets. If absent then the same network is used throughout the training run.
  • "--render" Specifies whether the environment is rendered on logging and evaluation.
  • "--seed" Random seed, default 0
  • "--total-timesteps" Total number of environment timesteps to run the training for.
  • "--eval-freq" Interval (in environment timesteps) between evaluations and logging of the policy.
  • "--game-mode" Specifies which Yawning Titan gamemode config file to use in the gamemodes config directory. Default "fixedepisode_base.json"
  • "--config" Specifies the training configuration file, in the trainconfig directory. "EntityYTbase.ron" by default.
  • "--checkpoint-dir" Specifies a directory to save checkpoints. By default this is generated automatically from other arguments and placed in a "checkpoints" parent directory.

Example Usage:

bash python EntityYT_train.py --n-nodes 10 --edge-prob 0.1 --episode-length 100 \ --total-timesteps 1000000 --eval-freq 10000 --random --config EntityYT_base.ron

2. EntityYT_evaluate.py

Evaluates a trained agent in the EntityYT environment, given a checkpoint.

Arguments

  • "--n-nodes" Specifies the number of nodes to be used in the evaluation environment. Default 10
    • "--edge-prob" Specifies the edge probability parameter when generating networks (using Erdős-Renyi model). Default 0.1
    • "--episode-length" Specifies the length of each episode. Default 100
    • "--red-skill" Specifies the 'skill' of the red agent in Yawning Titan. Between 0.0 and 1.0. Default 0.7
    • "--random" Specifies whether the network environment is re-generated between episode resets. If absent then the same network is used throughout evaluation.
    • "--render" Specifies whether the environment is rendered during evaluation.
    • "--seed" Randoom seed, default 0
    • "--game-mode" Specifies which Yawning Titan gamemode config file to use in the gamemodes config directory. Default "fixedepisode_base.json"
    • "--num-evals" Specifies the number of episodes to evaluate the agent over, default 30. In the paper, evaluations over 1000 episodes are used.
    • "--checkpoint-dir" Specifies a directory to load checkpoints from. By default, the most recent checkpoint beginning with "EntityYT" in the checkpoints directory will be loaded. #### Example Usage: bash python EntityYT_evaluate.py --n-nodes 10 --edge-prob 0.1 --episode-length 100 --num-evals 1000

3. Sb3_PPO_YT_train.py

This script trains an agent using Stable-Baselines3 PPO in an OpenAI Gym version of the Yawning Titan environment. This version uses fully 'concatenated' observation and action spaces, and the policy is parameterised using a multilayer perceptron. This script has the same arguments as the Entity-based Yawning Titan training script, apart from the rendering and logging function, which proved harder to implement correctly.

Arguments:

  • "--n-nodes" Specifies the number of nodes to be used in the Yawning Titan network environment. Default 10
  • "--edge-prob" Specifies the edge probability parameter when generating the network (using Erdős-Renyi model). Default 0.1
  • "--episode-length" Specifies the length of each episode. Default 100
  • "--red-skill" Specifies the 'skill' of the red agent in Yawning Titan. Between 0.0 and 1.0. Default 0.7
  • "--random" Specifies whether the network environment is re-generated between episode resets. If absent then the same network is used throughout the training run.
  • "--seed" Random seed, default 0
  • "--total-timesteps" Total number of environment timesteps to run the training for.
  • "--eval-freq" Interval (in environment timesteps) between evaluations and logging of the policy.
  • "--game-mode" Specifies which Yawning Titan gamemode config file to use in the gamemodes config directory. Default "fixedepisode_base.json"
  • "--config" Specifies the training configuration file, in the trainconfig directory. "Sb3PPO_base.ron" by default.
  • "--checkpoint-dir" Specifies a directory to save checkpoints. By default this is generated automatically from other arguments and placed in a "checkpoints" parent directory. #### Example Usage: bash python Sb3_PPO_YT_train.py --n-nodes 10 --edge-prob 0.1 --episode-length 100 \ --total-timesteps 1000000 --eval-freq 10000 --random --config Sb3_PPO_base.ron

4. Sb3_PPO_YT_evaluate.py

Evaluates a checkpointed Stable Baselines 3 PPO agent on a Yawning Titan environment. Uses the same argumnts as the Entity-based Yawning Titan evaluation script.

Arguments

  • "--n-nodes" Specifies the number of nodes to be used in the evaluation environment. Default 10
    • "--edge-prob" Specifies the edge probability parameter when generating networks (using Erdős-Renyi model). Default 0.1
    • "--episode-length" Specifies the length of each episode. Default 100
    • "--red-skill" Specifies the 'skill' of the red agent in Yawning Titan. Between 0.0 and 1.0. Default 0.7
    • "--random" Specifies whether the network environment is re-generated between episode resets. If absent then the same network is used throughout evaluation.
    • "--render" Specifies whether the environment is rendered during evaluation.
    • "--seed" Randoom seed, default 0
    • "--game-mode" Specifies which Yawning Titan gamemode config file to use in the gamemodes config directory. Default "fixedepisode_base.json"
    • "--num-evals" Specifies the number of episodes to evaluate the agent over, default 30. In the paper, evaluations over 1000 episodes are used.
    • "--checkpoint-dir" Specifies a directory to load checkpoints from. By default, the most recent checkpoint beginning with "EntityYT" in the checkpoints directory will be loaded. #### Example Usage: bash python Sb3_PPO_YT_evaluate.py --n-nodes 10 --edge-prob 0.1 --episode-length 100 --num-evals 30

Overview of Configuration files

Game Mode Configuration

Game modes are defined in JSON files located in configs/yawning_titan/game_modes/. Modify these files to adjust environment parameters, blue and red capabilities, and the reward function used by the environment.

:rotating_light: ! Important: Currently the Entity-based Yawning Titan environment does not support all of the Game mode options provided in Yawning Titan. So experimentation outside of the provided configuration may be unreliable. In particular, episodes with early termination conditions, and Deceptive node functionality are not supported.

The arguments to the training scripts 'episode-length', 'red-skill' override the values in the game_mode config, to allow for quicker experimentation

Training Configuration

Training settings are specified in .ron files within configs/yawning_titan/train_config/. - The EntityYT ... .ron configs follow the structure provided in the Entity Neural Network project. They specify the configuration of the Roguenet Transformer policy, the optimiser and PPO hyperparameters, the evaluation configuration as well as higher level options such as wandb settings and overall training length. EntityYT_base.ron contains the parameters used for the experiments in the paper, which are largely the defaults for the Entity-Neural-Network Trainer package. EntityYT_alt.ron contains PPO and optimiser parameters closer to the Stable Baselines 3 defaults. The training script arguments total-timesteps and eval-freq override the 'totaltimesteps' and 'eval.interval' values present in these files. - The Sb3PPO ... .ron configs specify the hyperparameters for Stable Baselines 3 PPO. Sb3_PPO_default.ron contains the default hyperparameters as specified in the Stable Baselines 3 documentation, and as used for the experiments in the paper. Sb3_PPO_alt.ron contains hyperparameters closer to the defaults used for the PPO optimiser in the Entity-Neural-Network Trainer.


Supported by the National Cyber Security Centre (NCSC)

Owner

  • Name: The Alan Turing Institute
  • Login: alan-turing-institute
  • Kind: organization
  • Email: info@turing.ac.uk

The UK's national institute for data science and artificial intelligence.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this repository, please cite the following work."
authors:
  - family-names: "Symes Thompson"
    given-names: "Isaac"
  - family-names: "Caron"
    given-names: "Alberto"
  - family-names: "Hicks"
    given-names: "Chris"
  - family-names: "Mavroudis"
    given-names: "Vasilios"
title: "Entity-based Reinforcement Learning for Autonomous Cyber Defence"
year: 2024
version: "v2"
doi: "10.48550/arXiv.2410.17647"
url: "https://arxiv.org/abs/2410.17647v2"

GitHub Events

Total
  • Watch event: 17
  • Public event: 1
  • Push event: 6
Last Year
  • Watch event: 17
  • Public event: 1
  • Push event: 6

Issues and Pull Requests

Last synced: 11 months ago


Dependencies

Dockerfile docker
  • python 3.9.20 build
pyproject.toml pypi
  • enn_trainer *
  • gym ==0.21.0
  • hyperstate *
  • networkx *
  • platformdirs *
  • stable_baselines3 ==1.6.2
  • tabulate *
  • tinydb *
  • torch ==1.12.0+cpu
  • torch-scatter *
  • typing_extensions *
  • wandb *
yawning_titan/pyproject.toml pypi
yawning_titan/setup.py pypi