https://github.com/amazon-science/cyber-zero

Cyber-Zero: Training Cybersecurity Agents Without Runtime

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary

Keywords

agent ctf cybersecurity large-language-models llm llm-training offensive-security

Last synced: 5 months ago · JSON representation

Repository

Cyber-Zero: Training Cybersecurity Agents Without Runtime

Basic Info

Host: GitHub
Owner: amazon-science
License: other
Language: Python
Default Branch: main
Homepage:
Size: 963 MB

Statistics

Stars: 9
Watchers: 2
Forks: 2
Open Issues: 3
Releases: 0

Topics

agent ctf cybersecurity large-language-models llm llm-training offensive-security

Created 7 months ago · Last pushed 6 months ago

Metadata Files

Readme Contributing License Code of conduct Security Support

Cyber-Zero: Training Cybersecurity Agents without Runtime

Check out our latest work!

First runtime environment for cybersecurity agents

Overview | Benchmark Suite | Quick Start | Architecture | Configuration | Generation | Validation | CLI Interface | Citation

Cyber-Zero is a comprehensive framework for training cybersecurity agents without requiring runtime execution environments.

Overview

Large Language Models (LLMs) have achieved remarkable success in software engineering tasks when trained with executable runtime environments, such environments are often unavailable in cybersecurity domains where challenge configurations and execution contexts are ephemeral or restricted. Cyber-Zero addresses this fundamental limitation by leveraging publicly available CTF writeups and employing persona-driven LLM simulation to reverse-engineer runtime behaviors and generate realistic, long-horizon interaction sequences without actual execution environments.

The key innovation is generating high-quality training trajectories through LLM simulation rather than requiring actual execution environments, making it scalable and practical for training cybersecurity agents. Using trajectories synthesized by Cyber-Zero, we train LLM-based agents that achieve up to 13.1% absolute performance gains over baseline models on three prominent CTF benchmarks: InterCode-CTF, NYU CTF Bench, and Cybench.

Benchmark Suite

To democratize the evaluation of cybersecurity agents, we provide three repaired benchmark suites adapted for EnIGMA+ in Cyber-Zero:

InterCode-CTF - A comprehensive collection of CTF challenges covering various cybersecurity domains

NYU CTF Bench - NYU's curated benchmark suite for evaluating LLM-based CTF solving capabilities

Cybench - A diverse benchmark covering multiple CTF categories and difficulty levels

All benchmarks have been reformatted to follow the EnIGMA and EnIGMA+ specification, with each challenge including a challenge.json file and docker-compose.yml file (when required).

Our benchmark suite addresses several issues identified in the original benchmarks, providing repaired versions for reliable evaluation. For detailed information about specific repairs and improvements, see the benchmarks/README.md.

EnIGMA+

To facilitate the development of cybersecurity agents, we present EnIGMA+, an enhanced agent scaffolding of EnIGMA that runs hundreds of CTF challenges in hours instead of days. EnIGMA+ is built on top of SWE-agent.

Using EnIGMA+, our best model, Cyber-Zero-32B, establishes new state-of-the-art performance among open-weight models, matching the capabilities of proprietary systems like DeepSeek-V3-0324 and Claude-3.5-Sonnet while offering superior cost-effectiveness, demonstrating that runtime-free trajectory synthesis can effectively democratize the development of state-of-the-art cybersecurity agents.

For detailed information about EnIGMA+, including installation, configuration, and usage instructions, please check the README in the enigma-plus folder.

Installation

From Source

```bash

Install dependencies

pip install -r requirements.txt

Install the package

pip install -e . ```

Quick Start

Generate Trajectories

```bash

Using the CLI

cyber-zero generate \ --sampledflagspath taskmeta.jsonl \ --outputpath trajectories.jsonl \ --trajectoriespertask 3 \ --workers 16

Using the direct script interface

python generatetrajectory.py \ --sampledflagspath taskmeta.jsonl \ --outputpath trajectories.jsonl \ --trajectoriesper_task 3 \ --workers 16 ```

Evaluate Quality

```bash

Using the CLI

cyber-zero evaluate \ --inputpath trajectories.jsonl \ --outputpath qualityresults.jsonl \ --modelid deepseek-v3-0324

Using the direct script interface

python evaluatequality.py \ --inputpath trajectories.jsonl \ --outputpath qualityresults.jsonl \ --model_id deepseek-v3-0324 ```

Reformat Trajectories

```bash

Using the CLI

cyber-zero reformat \ --inputpath qualityresults.jsonl \ --outputpath formattedtrajectories.jsonl \ --split_output

Using the direct script interface

python reformattrajectories.py \ --inputpath qualityresults.jsonl \ --outputpath formattedtrajectories.jsonl \ --splitoutput ```

Architecture

The framework follows a modular architecture of Cyber-Zero:

cyber_zero/ __init__.py # Package initialization config.py # Configuration management models.py # Data models (TaskMeta, TrajectoryData, etc.) utils.py # Common utilities validation.py # Response and command validation llm_client.py # LLM interaction and quality evaluation trajectory_generator.py # Main trajectory generation logic quality_evaluator.py # Quality evaluation for trajectories trajectory_reformatter.py # Trajectory reformatting for training cli.py # Command-line interface prompts/ # System prompts __init__.py assistant_turn_prompt.txt # Assistant (CTF player) prompt user_turn_prompt.txt # User (system/environment) prompt data_collection/ # Data collection utilities __init__.py # Package initialization config.py # Data collection configuration scraper.py # Shared web scraping utilities README.md # Data collection documentation

Key Components

Config: Centralized configuration management with model mappings and validation rules

Models: Type-safe data structures for tasks, trajectories, and evaluation results

Validation: Comprehensive validation of responses, commands, and action formats

LLMClient: Abstracted interface for different language models with retry logic

TrajectoryGenerator: Main orchestrator for conversation generation

CLI: User-friendly command-line interface

Configuration

The framework uses a hierarchical configuration system with centralized model management:

Basic Configuration

```python from cyber_zero import Config

config = Config() config.MAXTURNS = 60 # Maximum conversation turns (30 paired turns) config.DEFAULTWORKERS = 16 # Number of parallel workers ```

Model Configuration

The framework supports multiple LLM providers through litellm. The default model is DeepSeek-V3-0324 as specified in the research paper:

```python

Get available models

from cyberzero.config import Config config = Config() print(config.models.MODELMAPPINGS)

{'deepseek-v3-0324': 'deepseek-ai/DeepSeek-V3-0324', ...}

Use a specific model

modelid = config.models.getmodel_id("deepseek-v3-0324") ```

Adding Custom Models

To add custom models for trajectory generation, update the model mappings in cyber_zero/config.py:

```python

In cyber_zero/config.py

@dataclass class ModelConfig: def post_init(self): if self.MODELMAPPINGS is None: self.MODELMAPPINGS = { # Default research model "deepseek-v3-0324": "deepseek-ai/DeepSeek-V3-0324", # Add your custom models here "custom-model": "provider/your-custom-model-id", "local-model": "http://localhost:8000/v1/completions", } ```

Model Parameters

The framework uses research-validated parameters for optimal trajectory generation:

Temperature: 0.6 (balanced creativity and consistency)

Top-p: 0.95 (diverse but focused sampling)

Max Turns: 30 paired turns (60 total turns)

Trajectories per Task: 3 (for diversity)

Generation

TaskMeta

Represents CTF task metadata: python task = TaskMeta( task_name="Buffer Overflow Basic", task_tag="pwn", task_points="100", task_description="Find the vulnerability...", solution="flag{example}", # ... other fields )

TrajectoryData

Complete trajectory with conversation history: python trajectory = TrajectoryData( task_name="Example Task", trajectory=[ ConversationTurn(role="user", content="ls -la"), ConversationTurn(role="assistant", content="bash\nls -la\n"), # ... more turns ], # ... other fields )

Validation

The framework includes comprehensive validation mechanisms to ensure trajectory quality:

Response Validation

Command Format: Validates proper command syntax and structure

Action Consistency: Ensures actions are appropriate for the current context

Output Parsing: Validates command outputs and responses

Quality Assessment

Trajectory Completeness: Checks for proper conversation flow

Solution Accuracy: Validates that trajectories lead to correct solutions

Realism: Ensures generated trajectories mimic real CTF solving patterns

Validation Rules

```python from cyberzero.validation import validatetrajectory

Validate a complete trajectory

isvalid = validatetrajectory(trajectorydata) if isvalid: print("Trajectory passes validation") else: print("Trajectory requires review") ```

CLI Interface

Unified CLI Commands

```bash

Generate trajectories

cyber-zero generate --sampledflagspath data.jsonl --output_path trajectories.jsonl

Evaluate quality

cyber-zero evaluate --inputpath trajectories.jsonl --outputpath quality_results.jsonl

Reformat for training

cyber-zero reformat --inputpath qualityresults.jsonl --output_path formatted.jsonl ```

Direct Script Interface

All original CLI interfaces are maintained for backward compatibility: - python sync_task_metadata.py - Sync task meta data from the web - python generate_trajectory.py - Main trajectory generation - python evaluate_quality.py - Quality evaluation
- python reformat_trajectories.py - Trajectory reformatting

CLI Options

```bash

Generate with custom parameters

cyber-zero generate \ --sampledflagspath taskmeta.jsonl \ --outputpath trajectories.jsonl \ --trajectoriespertask 5 \ --workers 32 \ --model_id deepseek-v3-0324

Evaluate with specific model

cyber-zero evaluate \ --inputpath trajectories.jsonl \ --outputpath qualityresults.jsonl \ --modelid claude-3-sonnet-20240229

Reformat with split output

cyber-zero reformat \ --inputpath qualityresults.jsonl \ --outputpath formattedtrajectories.jsonl \ --splitoutput \ --trainratio 0.9 ```

Citation

If you use this benchmark suite in your research, please cite:

```bibtex @article{zhuo2025cyber, title={Cyber-Zero: Training Cybersecurity Agents without Runtime}, author={Zhuo, Terry Yue and Wang, Dingmin and Ding, Hantian and Kumar, Varun and Wang, Zijian}, journal={arXiv preprint arXiv:2508.00910}, year={2025}, }

@article{zhuo2025training, title={Training Language Model Agents to Find Vulnerabilities with CTF-Dojo}, author={Zhuo, Terry Yue and Wang, Dingmin and Ding, Hantian and Kumar, Varun and Wang, Zijian}, journal={arXiv preprint arXiv:2508.18370}, year={2025} } ```

License

This project is licensed under the CC-BY-NC-4.0 - see the LICENSE file for details.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines on how to contribute to this project.

Support

If you need help or have questions, please check our SUPPORT.md guide or open an issue on GitHub.

Code of Conduct

This project adheres to the Contributor Covenant Code of Conduct. Please read CODEOFCONDUCT.md for details.

Owner

Name: Amazon Science

Login: amazon-science

Kind: organization

Website: https://amazon.science

Twitter: AmazonScience

Repositories: 80

Profile: https://github.com/amazon-science

GitHub Events

Total

Issues event: 4

Watch event: 14

Issue comment event: 2

Push event: 2

Public event: 1

Pull request event: 2

Fork event: 2

Create event: 2

Last Year

Issues event: 4

Watch event: 14

Issue comment event: 2

Push event: 2

Public event: 1

Pull request event: 2

Fork event: 2

Create event: 2

Dependencies

benchmarks/nyu_ctf/2022/CSAW-Quals/misc/CatTheFlag/node_modules/async/bower.json bower

benchmarks/cybench/GLA/crypto/shuffled-aes/chall/Cargo.toml cargo

benchmarks/cybench/S23/pwn/network-tools/solution/Cargo.lock cargo

cfg-if 1.0.0

dns-lookup 2.0.2

libc 0.2.146

socket2 0.5.3

windows-sys 0.48.0

windows-targets 0.48.0

windows_aarch64_gnullvm 0.48.0

windows_aarch64_msvc 0.48.0

windows_i686_gnu 0.48.0

windows_i686_msvc 0.48.0

windows_x86_64_gnu 0.48.0

windows_x86_64_gnullvm 0.48.0

windows_x86_64_msvc 0.48.0

benchmarks/cybench/S23/pwn/network-tools/solution/Cargo.toml cargo

benchmarks/nyu_ctf/2019/CSAW-Finals/crypto/macrypto/Cargo.lock cargo

c2-chacha 0.2.3

cfg-if 0.1.10

getrandom 0.1.13

hex 0.4.0

libc 0.2.65

ppv-lite86 0.2.6

rand 0.7.2

rand_chacha 0.2.1

rand_core 0.5.1

rand_hc 0.2.0

wasi 0.7.0

benchmarks/nyu_ctf/2019/CSAW-Finals/crypto/macrypto/Cargo.toml cargo

benchmarks/nyu_ctf/2020/CSAW-Finals/rev/yeet/Cargo.lock cargo

autocfg 0.1.7

bitflags 1.2.1

cloudabi 0.0.3

fuchsia-cprng 0.1.1

libc 0.2.80

obfstr 0.1.1

obfstr-impl 0.1.1

rand 0.6.5

rand_chacha 0.1.1

rand_core 0.3.1

rand_core 0.4.2

rand_hc 0.1.0

rand_isaac 0.1.1

rand_jitter 0.1.4

rand_os 0.1.3

rand_pcg 0.1.2

rand_xorshift 0.1.1

rdrand 0.4.0

winapi 0.3.9

winapi-i686-pc-windows-gnu 0.4.0

winapi-x86_64-pc-windows-gnu 0.4.0

benchmarks/nyu_ctf/2020/CSAW-Finals/rev/yeet/Cargo.toml cargo

benchmarks/nyu_ctf/2021/CSAW-Finals/crypto/iBad/Cargo.lock cargo

231 dependencies

benchmarks/nyu_ctf/2021/CSAW-Finals/crypto/iBad/Cargo.toml cargo

benchmarks/nyu_ctf/2021/CSAW-Quals/crypto/Bits/Cargo.lock cargo

aes 0.7.4

az 1.1.1

block-buffer 0.9.0

cfg-if 1.0.0

cipher 0.3.0

cpufeatures 0.1.5

ctr 0.7.0

digest 0.9.0

generic-array 0.14.4

getrandom 0.2.3

gmp-mpfr-sys 1.4.6

lazy_static 1.4.0

libc 0.2.98

opaque-debug 0.3.0

rug 1.13.0

sha2 0.9.5

typenum 1.13.0

version_check 0.9.3

wasi 0.10.2+wasi-snapshot-preview1

winapi 0.3.9

winapi-i686-pc-windows-gnu 0.4.0

winapi-x86_64-pc-windows-gnu 0.4.0

benchmarks/nyu_ctf/2021/CSAW-Quals/crypto/Bits/Cargo.toml cargo

benchmarks/cybench/GLA/crypto/WalkingToTheSeaSide/Dockerfile docker

debian bookworm-slim build

benchmarks/cybench/GLA/crypto/WalkingToTheSeaSide/docker-compose.yml docker

benchmarks/cybench/GLA/crypto/shuffled-aes/Dockerfile docker

alpine latest build

rust alpine build

benchmarks/cybench/GLA/crypto/shuffled-aes/docker-compose.yml docker

benchmarks/cybench/GLA/crypto/skilift/Dockerfile docker

debian bookworm-slim build

benchmarks/cybench/GLA/crypto/skilift/docker-compose.yml docker

benchmarks/cybench/GLA/misc/avatar/Dockerfile docker

python 3.10-alpine build

benchmarks/cybench/GLA/misc/avatar/docker-compose.yml docker

benchmarks/cybench/GLA/web/GlacierExchange/Dockerfile docker

python 3.9-alpine build

benchmarks/cybench/GLA/web/GlacierExchange/docker-compose.yml docker

benchmarks/cybench/HKC/web/05-motp/docker-compose.yml docker

benchmarks/cybench/HKC/web/05-motp/env/otp/Dockerfile docker

php 8.1.12-apache-bullseye build

benchmarks/cybench/HKC/web/05-motp/public/docker-compose.yml docker

benchmarks/cybench/HKC/web/05-motp/public/otp/Dockerfile docker

php 8.1.12-apache-bullseye build

benchmarks/cybench/HKC/web/22-back-to-the-past/docker-compose.yml docker

benchmarks/cybench/HKC/web/22-back-to-the-past/env/Dockerfile docker

nginx 1.23.2 build

benchmarks/cybench/HTB/misc/[Easy] Unbreakable/Dockerfile docker

python 3.11-slim-buster build

benchmarks/cybench/HTB/misc/[Easy] Unbreakable/docker-compose.yml docker

benchmarks/cybench/HTB/misc/[Medium] Were Pickle Phreaks Revenge/Dockerfile docker

python 3.8-slim-buster build

benchmarks/cybench/HTB/misc/[Medium] Were Pickle Phreaks Revenge/docker-compose.yml docker

benchmarks/cybench/HTB/pwn/[Very Easy] Delulu/Dockerfile docker

alpine latest build

benchmarks/cybench/HTB/pwn/[Very Easy] Delulu/docker-compose.yml docker

benchmarks/cybench/HTB/web/[Easy] Labyrinth Linguist/Dockerfile docker

maven 3.8.5-openjdk-11-slim build

benchmarks/cybench/HTB/web/[Easy] Labyrinth Linguist/docker-compose.yml docker

benchmarks/cybench/HTB/web/[Easy] Labyrinth Linguist/release/Dockerfile docker

maven 3.8.5-openjdk-11-slim build

benchmarks/cybench/HTB/web/[Medium] LockTalk/Dockerfile docker

alpine edge build

benchmarks/cybench/HTB/web/[Medium] LockTalk/docker-compose.yml docker

benchmarks/cybench/HTB/web/[Very Easy] Flag Command/Dockerfile docker

python 3.11-alpine build

benchmarks/cybench/HTB/web/[Very Easy] Flag Command/docker-compose.yml docker

benchmarks/cybench/S22/crypto/diffecient/challenge/Dockerfile docker

python 3.9-slim-buster build

benchmarks/cybench/S22/crypto/ezmaze/challenge/Dockerfile docker

python 3.9-slim-buster build

benchmarks/cybench/S22/crypto/ezmaze/docker-compose.yml docker

benchmarks/cybench/S22/crypto/failproof/challenge/Dockerfile docker

python 3.9-slim-buster build

benchmarks/cybench/S22/crypto/failproof/docker-compose.yml docker

benchmarks/cybench/S22/crypto/failproof/metadata/solution/challenge/Dockerfile docker

python 3.9-slim-buster build

benchmarks/cybench/S22/crypto/robust-cbc/challenge/Dockerfile docker

ubuntu focal build

benchmarks/cybench/S22/crypto/robust-cbc/docker-compose.yml docker

benchmarks/cybench/S23/crypto/noisier-crc/challenge/Dockerfile docker

python 3.9-slim-buster build

benchmarks/cybench/S23/crypto/noisier-crc/docker-compose.yml docker

benchmarks/cybench/S23/crypto/noisy-crc/challenge/Dockerfile docker

python 3.9-slim-buster build

benchmarks/cybench/S23/crypto/noisy-crc/docker-compose.yml docker

benchmarks/cybench/S23/crypto/randsubware/docker-compose.yml docker

benchmarks/cybench/S23/crypto/randsubware/metadata/challenge/Dockerfile docker

python 3.9-slim-buster build

benchmarks/cybench/S23/forensics/eval-me/challenge/Dockerfile docker

python 3.9 build

benchmarks/cybench/S23/forensics/eval-me/docker-compose.yml docker

benchmarks/cybench/S23/misc/just-another-pickle-jail/challenge/Dockerfile docker

python 3.11 build

benchmarks/cybench/S23/misc/just-another-pickle-jail/docker-compose.yml docker

benchmarks/cybench/S23/pwn/network-tools/challenge/Dockerfile docker

ubuntu 20.04 build

benchmarks/cybench/S23/pwn/network-tools/challenge/docker-compose.yml docker

benchmarks/cybench/S23/pwn/network-tools/docker-compose.yml docker

benchmarks/cybench/S23/web/chunky/challenge/blog/Dockerfile docker

ubuntu 20.04 build

benchmarks/cybench/S23/web/chunky/challenge/nginx/Dockerfile docker

nginx 1.18.0 build

benchmarks/cybench/S23/web/chunky/challenge/rcache/Dockerfile docker

golang 1.20 build

benchmarks/cybench/S23/web/chunky/dist/blog/Dockerfile docker

ubuntu 20.04 build

benchmarks/cybench/S23/web/chunky/dist/docker-compose.yml docker

benchmarks/cybench/S23/web/chunky/dist/nginx/Dockerfile docker

nginx 1.18.0 build

benchmarks/cybench/S23/web/chunky/dist/rcache/Dockerfile docker

golang 1.20 build

benchmarks/cybench/S23/web/chunky/docker-compose.yml docker

benchmarks/cybench/S23/web/frog-waf/dist/Dockerfile docker

gradle 7.5-jdk11-alpine build

openjdk 11-slim build

benchmarks/cybench/S23/web/frog-waf/docker-compose.yml docker

benchmarks/cybench/S23/web/frog-waf/metadata/challenge/Dockerfile docker

gradle 7.5-jdk11 build

openjdk 11-slim build

benchmarks/nyu_ctf/2017/CSAW-Finals/crypto/Lupin/Dockerfile docker

ubuntu 14.04 build

benchmarks/nyu_ctf/2017/CSAW-Finals/crypto/Lupin/docker-compose.yml docker

llmctf/2017f-cry-lupin latest

benchmarks/nyu_ctf/2017/CSAW-Finals/pwn/Humm_sCh-t/Dockerfile docker

ubuntu artful build

benchmarks/nyu_ctf/2017/CSAW-Finals/pwn/Humm_sCh-t/docker-compose.yml docker

llmctf/2017f-pwn-humm_sch-t latest

benchmarks/nyu_ctf/2017/CSAW-Finals/rev/48-bit_yeet_lab/Dockerfile docker

ubuntu 16.04 build

benchmarks/nyu_ctf/2017/CSAW-Finals/rev/48-bit_yeet_lab/docker-compose.yml docker

llmctf/2017f-rev-48-bit_yeet_lab latest

benchmarks/nyu_ctf/2017/CSAW-Finals/rev/rusty_road/Dockerfile docker

ubuntu 16.04 build

benchmarks/nyu_ctf/2017/CSAW-Finals/rev/rusty_road/docker-compose.yml docker

llmctf/2017f-rev-rusty_road latest

benchmarks/nyu_ctf/2017/CSAW-Quals/crypto/baby_crypt/Dockerfile docker

ubuntu 14.04 build

benchmarks/nyu_ctf/2017/CSAW-Quals/crypto/baby_crypt/docker-compose.yml docker

llmctf/2017q-rev-baby_crypt latest

benchmarks/nyu_ctf/2017/CSAW-Quals/forensics/best_router/Dockerfile docker

ubuntu 16.04 build

benchmarks/nyu_ctf/2017/CSAW-Quals/forensics/best_router/docker-compose.yml docker

llmctf/2017q-for-best_router latest

benchmarks/nyu_ctf/2017/CSAW-Quals/misc/cvv/Dockerfile docker

ubuntu 14.04 build

benchmarks/nyu_ctf/2017/CSAW-Quals/misc/cvv/docker-compose.yml docker

llmctf/2017q-msc-cvv latest

benchmarks/nyu_ctf/2017/CSAW-Quals/misc/serial/Dockerfile docker

ubuntu 14.04 build

benchmarks/nyu_ctf/2017/CSAW-Quals/misc/serial/docker-compose.yml docker

llmctf/2017q-msc-serial latest

benchmarks/nyu_ctf/2017/CSAW-Quals/pwn/pilot/Dockerfile docker

ubuntu 16.04 build

benchmarks/nyu_ctf/2017/CSAW-Quals/pwn/pilot/docker-compose.yml docker

llmctf/2017q-pwn-pilot latest

benchmarks/nyu_ctf/2017/CSAW-Quals/pwn/zone/Dockerfile docker

ubuntu 16.04 build

benchmarks/nyu_ctf/2017/CSAW-Quals/pwn/zone/docker-compose.yml docker

llmctf/2017q-pwn-zone latest

benchmarks/nyu_ctf/2017/CSAW-Quals/rev/gopherz/Dockerfile docker

ubuntu 16.04 build

benchmarks/nyu_ctf/2017/CSAW-Quals/rev/gopherz/docker-compose.yml docker

llmctf/2017q-rev-gopherz latest

benchmarks/nyu_ctf/2017/CSAW-Quals/rev/grumpcheck/Dockerfile docker

ubuntu 14.04 build

benchmarks/nyu_ctf/2017/CSAW-Quals/rev/grumpcheck/docker-compose.yml docker

llmctf/2017q-rev-grumpcheck latest

benchmarks/nyu_ctf/2017/CSAW-Quals/rev/prophecy/Dockerfile docker

ubuntu 16.04 build

benchmarks/nyu_ctf/2017/CSAW-Quals/rev/prophecy/docker-compose.yml docker

llmctf/2017q-rev-prophecy latest

benchmarks/nyu_ctf/2017/CSAW-Quals/web/littlequery/docker-compose.yml docker

llmctf/2017q-web-littlequery latest

https://github.com/amazon-science/cyber-zero

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Cyber-Zero: Training Cybersecurity Agents without Runtime

Check out our latest work! First runtime environment for cybersecurity agents

Overview

Benchmark Suite

EnIGMA+

Installation

From Source

Install dependencies

Install the package

Quick Start

Generate Trajectories

Using the CLI

Using the direct script interface

Evaluate Quality

Using the CLI

Using the direct script interface

Reformat Trajectories

Using the CLI

Using the direct script interface

Architecture

Key Components

Configuration

Basic Configuration

Model Configuration

Get available models

{'deepseek-v3-0324': 'deepseek-ai/DeepSeek-V3-0324', ...}

Use a specific model

Adding Custom Models

In cyber_zero/config.py

Model Parameters

Generation

TaskMeta

TrajectoryData

Validation

Response Validation

Quality Assessment

Validation Rules

Validate a complete trajectory

CLI Interface

Unified CLI Commands

Generate trajectories

Evaluate quality

Reformat for training

Direct Script Interface

CLI Options

Generate with custom parameters

Evaluate with specific model

Reformat with split output

Citation

License

Contributing

Support

Code of Conduct

Owner

GitHub Events

Total

Last Year

Dependencies

Check out our latest work!

First runtime environment for cybersecurity agents