https://github.com/amazon-science/cyber-zero
Cyber-Zero: Training Cybersecurity Agents Without Runtime
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Keywords
Repository
Cyber-Zero: Training Cybersecurity Agents Without Runtime
Basic Info
Statistics
- Stars: 9
- Watchers: 2
- Forks: 2
- Open Issues: 3
- Releases: 0
Topics
Metadata Files
README.md
Cyber-Zero: Training Cybersecurity Agents without Runtime
Overview | Benchmark Suite | Quick Start | Architecture | Configuration | Generation | Validation | CLI Interface | Citation
Cyber-Zero is a comprehensive framework for training cybersecurity agents without requiring runtime execution environments.
Overview
Large Language Models (LLMs) have achieved remarkable success in software engineering tasks when trained with executable runtime environments, such environments are often unavailable in cybersecurity domains where challenge configurations and execution contexts are ephemeral or restricted. Cyber-Zero addresses this fundamental limitation by leveraging publicly available CTF writeups and employing persona-driven LLM simulation to reverse-engineer runtime behaviors and generate realistic, long-horizon interaction sequences without actual execution environments.
The key innovation is generating high-quality training trajectories through LLM simulation rather than requiring actual execution environments, making it scalable and practical for training cybersecurity agents. Using trajectories synthesized by Cyber-Zero, we train LLM-based agents that achieve up to 13.1% absolute performance gains over baseline models on three prominent CTF benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench.
Benchmark Suite
To democratize the evaluation of cybersecurity agents, we provide three repaired benchmark suites adapted for EnIGMA+ in Cyber-Zero:
- InterCode-CTF - A comprehensive collection of CTF challenges covering various cybersecurity domains
- NYU CTF Bench - NYU's curated benchmark suite for evaluating LLM-based CTF solving capabilities
- Cybench - A diverse benchmark covering multiple CTF categories and difficulty levels
All benchmarks have been reformatted to follow the EnIGMA and EnIGMA+ specification, with each challenge including a challenge.json file and docker-compose.yml file (when required).
Our benchmark suite addresses several issues identified in the original benchmarks, providing repaired versions for reliable evaluation. For detailed information about specific repairs and improvements, see the benchmarks/README.md.
EnIGMA+
To facilitate the development of cybersecurity agents, we present EnIGMA+, an enhanced agent scaffolding of EnIGMA that runs hundreds of CTF challenges in hours instead of days. EnIGMA+ is built on top of SWE-agent.
Using EnIGMA+, our best model, Cyber-Zero-32B, establishes new state-of-the-art performance among open-weight models, matching the capabilities of proprietary systems like DeepSeek-V3-0324 and Claude-3.5-Sonnet while offering superior cost-effectiveness, demonstrating that runtime-free trajectory synthesis can effectively democratize the development of state-of-the-art cybersecurity agents.
For detailed information about EnIGMA+, including installation, configuration, and usage instructions, please check the README in the enigma-plus folder.
Installation
From Source
```bash
Install dependencies
pip install -r requirements.txt
Install the package
pip install -e . ```
Quick Start
Generate Trajectories
```bash
Using the CLI
cyber-zero generate \ --sampledflagspath taskmeta.jsonl \ --outputpath trajectories.jsonl \ --trajectoriespertask 3 \ --workers 16
Using the direct script interface
python generatetrajectory.py \ --sampledflagspath taskmeta.jsonl \ --outputpath trajectories.jsonl \ --trajectoriesper_task 3 \ --workers 16 ```
Evaluate Quality
```bash
Using the CLI
cyber-zero evaluate \ --inputpath trajectories.jsonl \ --outputpath qualityresults.jsonl \ --modelid deepseek-v3-0324
Using the direct script interface
python evaluatequality.py \ --inputpath trajectories.jsonl \ --outputpath qualityresults.jsonl \ --model_id deepseek-v3-0324 ```
Reformat Trajectories
```bash
Using the CLI
cyber-zero reformat \ --inputpath qualityresults.jsonl \ --outputpath formattedtrajectories.jsonl \ --split_output
Using the direct script interface
python reformattrajectories.py \ --inputpath qualityresults.jsonl \ --outputpath formattedtrajectories.jsonl \ --splitoutput ```
Architecture
The framework follows a modular architecture of Cyber-Zero:
cyber_zero/
__init__.py # Package initialization
config.py # Configuration management
models.py # Data models (TaskMeta, TrajectoryData, etc.)
utils.py # Common utilities
validation.py # Response and command validation
llm_client.py # LLM interaction and quality evaluation
trajectory_generator.py # Main trajectory generation logic
quality_evaluator.py # Quality evaluation for trajectories
trajectory_reformatter.py # Trajectory reformatting for training
cli.py # Command-line interface
prompts/ # System prompts
__init__.py
assistant_turn_prompt.txt # Assistant (CTF player) prompt
user_turn_prompt.txt # User (system/environment) prompt
data_collection/ # Data collection utilities
__init__.py # Package initialization
config.py # Data collection configuration
scraper.py # Shared web scraping utilities
README.md # Data collection documentation
Key Components
- Config: Centralized configuration management with model mappings and validation rules
- Models: Type-safe data structures for tasks, trajectories, and evaluation results
- Validation: Comprehensive validation of responses, commands, and action formats
- LLMClient: Abstracted interface for different language models with retry logic
- TrajectoryGenerator: Main orchestrator for conversation generation
- CLI: User-friendly command-line interface
Configuration
The framework uses a hierarchical configuration system with centralized model management:
Basic Configuration
```python from cyber_zero import Config
config = Config() config.MAXTURNS = 60 # Maximum conversation turns (30 paired turns) config.DEFAULTWORKERS = 16 # Number of parallel workers ```
Model Configuration
The framework supports multiple LLM providers through litellm. The default model is DeepSeek-V3-0324 as specified in the research paper:
```python
Get available models
from cyberzero.config import Config config = Config() print(config.models.MODELMAPPINGS)
{'deepseek-v3-0324': 'deepseek-ai/DeepSeek-V3-0324', ...}
Use a specific model
modelid = config.models.getmodel_id("deepseek-v3-0324") ```
Adding Custom Models
To add custom models for trajectory generation, update the model mappings in cyber_zero/config.py:
```python
In cyber_zero/config.py
@dataclass class ModelConfig: def post_init(self): if self.MODELMAPPINGS is None: self.MODELMAPPINGS = { # Default research model "deepseek-v3-0324": "deepseek-ai/DeepSeek-V3-0324", # Add your custom models here "custom-model": "provider/your-custom-model-id", "local-model": "http://localhost:8000/v1/completions", } ```
Model Parameters
The framework uses research-validated parameters for optimal trajectory generation:
- Temperature: 0.6 (balanced creativity and consistency)
- Top-p: 0.95 (diverse but focused sampling)
- Max Turns: 30 paired turns (60 total turns)
- Trajectories per Task: 3 (for diversity)
Generation
TaskMeta
Represents CTF task metadata:
python
task = TaskMeta(
task_name="Buffer Overflow Basic",
task_tag="pwn",
task_points="100",
task_description="Find the vulnerability...",
solution="flag{example}",
# ... other fields
)
TrajectoryData
Complete trajectory with conversation history:
python
trajectory = TrajectoryData(
task_name="Example Task",
trajectory=[
ConversationTurn(role="user", content="ls -la"),
ConversationTurn(role="assistant", content="bash\nls -la\n"),
# ... more turns
],
# ... other fields
)
Validation
The framework includes comprehensive validation mechanisms to ensure trajectory quality:
Response Validation
- Command Format: Validates proper command syntax and structure
- Action Consistency: Ensures actions are appropriate for the current context
- Output Parsing: Validates command outputs and responses
Quality Assessment
- Trajectory Completeness: Checks for proper conversation flow
- Solution Accuracy: Validates that trajectories lead to correct solutions
- Realism: Ensures generated trajectories mimic real CTF solving patterns
Validation Rules
```python from cyberzero.validation import validatetrajectory
Validate a complete trajectory
isvalid = validatetrajectory(trajectorydata) if isvalid: print("Trajectory passes validation") else: print("Trajectory requires review") ```
CLI Interface
Unified CLI Commands
```bash
Generate trajectories
cyber-zero generate --sampledflagspath data.jsonl --output_path trajectories.jsonl
Evaluate quality
cyber-zero evaluate --inputpath trajectories.jsonl --outputpath quality_results.jsonl
Reformat for training
cyber-zero reformat --inputpath qualityresults.jsonl --output_path formatted.jsonl ```
Direct Script Interface
All original CLI interfaces are maintained for backward compatibility:
- python sync_task_metadata.py - Sync task meta data from the web
- python generate_trajectory.py - Main trajectory generation
- python evaluate_quality.py - Quality evaluation
- python reformat_trajectories.py - Trajectory reformatting
CLI Options
```bash
Generate with custom parameters
cyber-zero generate \ --sampledflagspath taskmeta.jsonl \ --outputpath trajectories.jsonl \ --trajectoriespertask 5 \ --workers 32 \ --model_id deepseek-v3-0324
Evaluate with specific model
cyber-zero evaluate \ --inputpath trajectories.jsonl \ --outputpath qualityresults.jsonl \ --modelid claude-3-sonnet-20240229
Reformat with split output
cyber-zero reformat \ --inputpath qualityresults.jsonl \ --outputpath formattedtrajectories.jsonl \ --splitoutput \ --trainratio 0.9 ```
Citation
If you use this benchmark suite in your research, please cite:
```bibtex @article{zhuo2025cyber, title={Cyber-Zero: Training Cybersecurity Agents without Runtime}, author={Zhuo, Terry Yue and Wang, Dingmin and Ding, Hantian and Kumar, Varun and Wang, Zijian}, journal={arXiv preprint arXiv:2508.00910}, year={2025}, }
@article{zhuo2025training, title={Training Language Model Agents to Find Vulnerabilities with CTF-Dojo}, author={Zhuo, Terry Yue and Wang, Dingmin and Ding, Hantian and Kumar, Varun and Wang, Zijian}, journal={arXiv preprint arXiv:2508.18370}, year={2025} } ```
License
This project is licensed under the CC-BY-NC-4.0 - see the LICENSE file for details.
Contributing
We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to contribute to this project.
Support
If you need help or have questions, please check our SUPPORT.md guide or open an issue on GitHub.
Code of Conduct
This project adheres to the Contributor Covenant Code of Conduct. Please read CODEOFCONDUCT.md for details.
Owner
- Name: Amazon Science
- Login: amazon-science
- Kind: organization
- Website: https://amazon.science
- Twitter: AmazonScience
- Repositories: 80
- Profile: https://github.com/amazon-science
GitHub Events
Total
- Issues event: 4
- Watch event: 14
- Issue comment event: 2
- Push event: 2
- Public event: 1
- Pull request event: 2
- Fork event: 2
- Create event: 2
Last Year
- Issues event: 4
- Watch event: 14
- Issue comment event: 2
- Push event: 2
- Public event: 1
- Pull request event: 2
- Fork event: 2
- Create event: 2
Dependencies
- cfg-if 1.0.0
- dns-lookup 2.0.2
- libc 0.2.146
- socket2 0.5.3
- windows-sys 0.48.0
- windows-targets 0.48.0
- windows_aarch64_gnullvm 0.48.0
- windows_aarch64_msvc 0.48.0
- windows_i686_gnu 0.48.0
- windows_i686_msvc 0.48.0
- windows_x86_64_gnu 0.48.0
- windows_x86_64_gnullvm 0.48.0
- windows_x86_64_msvc 0.48.0
- c2-chacha 0.2.3
- cfg-if 0.1.10
- getrandom 0.1.13
- hex 0.4.0
- libc 0.2.65
- ppv-lite86 0.2.6
- rand 0.7.2
- rand_chacha 0.2.1
- rand_core 0.5.1
- rand_hc 0.2.0
- wasi 0.7.0
- autocfg 0.1.7
- bitflags 1.2.1
- cloudabi 0.0.3
- fuchsia-cprng 0.1.1
- libc 0.2.80
- obfstr 0.1.1
- obfstr-impl 0.1.1
- rand 0.6.5
- rand_chacha 0.1.1
- rand_core 0.3.1
- rand_core 0.4.2
- rand_hc 0.1.0
- rand_isaac 0.1.1
- rand_jitter 0.1.4
- rand_os 0.1.3
- rand_pcg 0.1.2
- rand_xorshift 0.1.1
- rdrand 0.4.0
- winapi 0.3.9
- winapi-i686-pc-windows-gnu 0.4.0
- winapi-x86_64-pc-windows-gnu 0.4.0
- 231 dependencies
- aes 0.7.4
- az 1.1.1
- block-buffer 0.9.0
- cfg-if 1.0.0
- cipher 0.3.0
- cpufeatures 0.1.5
- ctr 0.7.0
- digest 0.9.0
- generic-array 0.14.4
- getrandom 0.2.3
- gmp-mpfr-sys 1.4.6
- lazy_static 1.4.0
- libc 0.2.98
- opaque-debug 0.3.0
- rug 1.13.0
- sha2 0.9.5
- typenum 1.13.0
- version_check 0.9.3
- wasi 0.10.2+wasi-snapshot-preview1
- winapi 0.3.9
- winapi-i686-pc-windows-gnu 0.4.0
- winapi-x86_64-pc-windows-gnu 0.4.0
- debian bookworm-slim build
- alpine latest build
- rust alpine build
- debian bookworm-slim build
- python 3.10-alpine build
- python 3.9-alpine build
- php 8.1.12-apache-bullseye build
- php 8.1.12-apache-bullseye build
- nginx 1.23.2 build
- python 3.11-slim-buster build
- python 3.8-slim-buster build
- alpine latest build
- maven 3.8.5-openjdk-11-slim build
- maven 3.8.5-openjdk-11-slim build
- alpine edge build
- python 3.11-alpine build
- python 3.9-slim-buster build
- python 3.9-slim-buster build
- python 3.9-slim-buster build
- python 3.9-slim-buster build
- ubuntu focal build
- python 3.9-slim-buster build
- python 3.9-slim-buster build
- python 3.9-slim-buster build
- python 3.9 build
- python 3.11 build
- ubuntu 20.04 build
- ubuntu 20.04 build
- nginx 1.18.0 build
- golang 1.20 build
- ubuntu 20.04 build
- nginx 1.18.0 build
- golang 1.20 build
- gradle 7.5-jdk11-alpine build
- openjdk 11-slim build
- gradle 7.5-jdk11 build
- openjdk 11-slim build
- ubuntu 14.04 build
- llmctf/2017f-cry-lupin latest
- ubuntu artful build
- llmctf/2017f-pwn-humm_sch-t latest
- ubuntu 16.04 build
- llmctf/2017f-rev-48-bit_yeet_lab latest
- ubuntu 16.04 build
- llmctf/2017f-rev-rusty_road latest
- ubuntu 14.04 build
- llmctf/2017q-rev-baby_crypt latest
- ubuntu 16.04 build
- llmctf/2017q-for-best_router latest
- ubuntu 14.04 build
- llmctf/2017q-msc-cvv latest
- ubuntu 14.04 build
- llmctf/2017q-msc-serial latest
- ubuntu 16.04 build
- llmctf/2017q-pwn-pilot latest
- ubuntu 16.04 build
- llmctf/2017q-pwn-zone latest
- ubuntu 16.04 build
- llmctf/2017q-rev-gopherz latest
- ubuntu 14.04 build
- llmctf/2017q-rev-grumpcheck latest
- ubuntu 16.04 build
- llmctf/2017q-rev-prophecy latest
- llmctf/2017q-web-littlequery latest