This repository implements a Graph Attention Network (GAT) (same architecture as TacticAI) as a network-aware reinforcement learning policy for cyber defence. Our work extends the Cyber Operations Research Gym (CybORG) to represent network states as directed graphs with low-level features to explore more realistic autonomous defence strategies.
- Topology-Aware Defence: Processes the complete network graph structure instead of simplified flat state observations
- Runtime Adaptability: Handles dynamic changes in network topology as new connections appear
- Cross-Network Generalisation: Trained policies can be deployed to networks of different sizes
- Enhanced Interpretability: Defence actions can be explained through tangible network properties
- Custom CybORG environment with graph-based network state representation
- GAT architecture modified for compatibility with policy gradient methods
- Empirical evaluation for assessing policy generalisation vs. specialised training across varying network sizes
Note
This is a research project that serves as a proof-of-concept towards more realistic network environments in cyber defence. Our implementation uses the low-level structure of the CybORG v2.1 simulator as a practical context, but the technique itself can be adapted to other simulators with comparable complexity.
Expand
We used and recommend pixi to setup a reproducible project with predefined tasks.
Clone this repo recursively to clone the CybORG v2.1 simulator and CAGE 2 reference submissions as submodules.
git clone https://github.com/IlyaOrson/CyberDreamcatcher.git --recurse-submodules -j4
Install the dependencies of the project in a local environment.
cd CyberDreamcatcher
pixi install # setup from pixi.toml file
Then install the submodules as local packages avoiding using pip to deal with dependencies.
# install environments from git submodules as a local packages
pixi run install-cyborg # CybORG 2.1 + update to gymnasium API
# OR a debugged version from The Alan Turing Institute (https://github.com/alan-turing-institute/CybORG_plus_plus)
pixi run install-cyborg-debugged
# install troublesome dependencies without using pip to track their requirements
pixi run install-sb3 # stable baselines 3
Voila! An activated shell within this environment will have all dependencies working together.
pixi shell # activate shell
python -m cyberdreamcatcher # try out a single environment simulation
[!TIP] If you would like to use other project management tool, the list of dependencies and installation tasks are available in pixi.toml. Untested environment files are provided for uv/pip (pyproject.toml) and for conda/mamba (conda_env.yml). Make sure to manually ignore the deps set by CybORG/SB3 when installing it locally.
We include predefined tasks that can be run to make sure everything is working:
pixi task list # displays available tasks
pixi run test-cyborg # run gymnasium-based cyborg tests
pixi run eval-cardiff # CAGE 2 winner policy inference (simplified and flattened observation space)
Tip
Hydra is used to handle the inputs and outputs of every script.
The available parameters for each task are accessible with the --help
flag.
The content generated per execution is stored in the outputs/
directory with subdirectories per timestamp of execution.
The hyperparameters used in each run are registered in a hidden subfolder .hydra/
within the generated output folder.
Tensorboard is used to track interesting metrics, just specify the correct hydra output folder as the logdir: tensorboard --logdir=outputs/...
Quickly visualise the graph layout setup in the cage 2 challenge scenario file, and the graph observations received by a random GAT policy.
pixi run plot-network scenario=Scenario2 # see --help for hyperparameters
Warning
This is the layout we expect from the simulator configuration and the actions available to the meander agent, but CybORG does not enforce these connection layout at runtime. Connections between other subnets to User0 appear sporadically (unexpected), possibly as a hackish way of flagging the interaction of the meander agent with deployed decoys.
We include an implementation of the REINFORCE algorithm with a normalised rewards-to-go baseline. This is a bit slow since it samples a lot of episodes with a fixed policy to estimate the gradient before taking an optimisation step.
pixi run train-gnn-reinforce # see --help for hyperparameters
This script trains a MLP policy with PPO using Stable Baselines 3. The observation space used is the original CAGE 2 observation space - a flattened high-level representation of the network.
pixi run train-flat-sb3-ppo # see --help for hyperparameters
Important
This SB3 MLP serves as a reference for performance, but cannot extrapolate to different network dimensions. A major caveat for a performance comparison with this or the CAGE 2 submissions is that the observation spaces are fundamentally different: the flattened version is a higher level representation designed for the CAGE 2 Challenge, whereas our custom graph observation uses low-level information from the CybORG simulator. See below for a performance comparison with CAGE 2 Challenge submissions.
It is possible (❗) to extrapolate the performance of a trained GAT policy under different network layouts.
Expand
Specify a scenario to sample episodes from and optionally the weights of a pretrained policy (potentially trained on a different scenario).
# The default behaviour is to use a random policy on "Scenario2".
pixi run plot-performance
# This will compare the performance of a trained policy
# with a random policy on the scenario used for training
pixi run plot-performance policy_weights="path/to/trained_params.pt"
Expand
The objective is to compare the optimality gap trade-off between the extrapolation of a policy against a policy trained from scratch in each scenario. Specify the path to the trained policy to be tested and array of paths of the specialised policies to compare it to; the corresponding scenarios are loaded from the logged configuration.
# add --help to see the available options
pixi run plot-generalisation policy_weights=path/to/trained_params.pt local_policies=[path/to/0/trained_params.pt,path/to/1/trained_params.pt,path/to/3/trained_params.pt, ...]
Expand
For a detailed description of the CAGE 2 Challenge, see this preprint.
For a complete list of CAGE 2 submission standings, see here.
Scenario 2 Red Agent: Meander Steps: 30 |
Penalty mean |
Observation Space | Structural Generalization |
---|---|---|---|
CAGE2 Winner | ~ 6 | High-level Flat | No |
Stable Baselines 3 MLP + PPO |
~ 12 | High-level Flat | No |
CyberDreamcatcher REINFORCE |
~ 18 | Low-level Graph | Reasonable |
CAGE2 CSS Random | ~ 33 | High-level Flat | N/A |
CAGE2 CSS Sleeper | ~ 39 | High-level Flat | N/A |