https://github.com/aielte-research/hacksynth

LLM Agent and Evaluation Framework for Autonomous Penetration Testing

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.5%) to scientific vocabulary

Keywords

ai autonomous-pentesting ctf ctf-tools cybersecurity llms penetration-testing

Last synced: 5 months ago · JSON representation

Repository

LLM Agent and Evaluation Framework for Autonomous Penetration Testing

Basic Info

Host: GitHub
Owner: aielte-research
License: agpl-3.0
Language: Python
Default Branch: main
Homepage:
Size: 1.91 MB

Statistics

Stars: 31
Watchers: 1
Forks: 7
Open Issues: 1
Releases: 0

Topics

ai autonomous-pentesting ctf ctf-tools cybersecurity llms penetration-testing

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License

HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing

The paper can be found on arXiv.

Introduction

We introduce HackSynth, a novel Large Language Model (LLM)-based agent capable of autonomous penetration testing. HackSynth's dual-module architecture includes a Planner and a Summarizer, which enable it to generate commands and process feedback iteratively. To benchmark HackSynth, we propose two new Capture The Flag (CTF)-based benchmark sets utilizing the popular platforms PicoCTF and OverTheWire. These benchmarks include two hundred challenges across diverse domains and difficulties, providing a standardized framework for evaluating LLM-based penetration testing agents.

Using the repository

You will have to create a Hugging Face and a Neptune.ai account
Copy your API keys to the .env file, and set the desired CUDA devices, based on the .env_example
Set up the PicoCTF benchmark
Set up the OverTheWire benchmark
Start the HackSynth Agent
- Install the environment: python -m venv cyber_venv source cyber_venv/bin/activate pip install -r requirements.txt
- Start the benchmark with the following: python run_bench.py -b benchmark.json -c config.json The benchmark.json should be one of the generated benchmark_solved.json files, or an equivalently structured file. The configuration files used by us for the measurements in the paper are also available in the configs folder.

How to Cite

If you use this code in your work or research, please cite the corresponding paper: bibtex @misc{muzsai2024hacksynthllmagentevaluation, title={HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing}, author={Lajos Muzsai and David Imolai and András Lukács}, year={2024}, eprint={2412.01778}, archivePrefix={arXiv}, primaryClass={cs.CR}, url={https://arxiv.org/abs/2412.01778}, }

Contributors

Lajos Muzsai (muzsailajos@protonmail.com)
David Imolai (david@imol.ai)
András Lukács (andras.lukacs@ttk.elte.hu)

🔍 Also see our related project on reinforcement learning for cryptographic CTFs: HackSynth-GRPO

License

The project uses the GNU AGPLv3 license.

Owner

Name: aielte-research
Login: aielte-research
Kind: organization

Repositories: 1
Profile: https://github.com/aielte-research

GitHub Events

Total

Watch event: 167
Push event: 3
Public event: 1
Fork event: 28

Last Year

Watch event: 167
Push event: 3
Public event: 1
Fork event: 28

Dependencies

picoctf_bench/Dockerfile docker

ubuntu latest build

requirements.txt pypi

GitPython ==3.1.43
Jinja2 ==3.1.3
Mako ==1.3.5
MarkupSafe ==2.1.5
PyExifTool ==0.5.6
PyJWT ==2.8.0
PyMuPDF ==1.24.7
PyMuPDFb ==1.24.6
PyNaCl ==1.5.0
PySocks ==1.7.1
Pygments ==2.17.2
ROPGadget ==7.4
accelerate ==0.29.3
arrow ==1.3.0
asttokens ==2.4.1
attrs ==23.2.0
bcrypt ==4.2.0
boto3 ==1.34.94
botocore ==1.34.94
bravado ==11.0.3
bravado-core ==6.1.1
capstone ==5.0.1
certifi ==2024.2.2
cffi ==1.16.0
charset-normalizer ==3.3.2
click ==8.1.7
colored-traceback ==0.4.2
comm ==0.2.2
contourpy ==1.2.1
cryptography ==43.0.0
cycler ==0.12.1
debugpy ==1.8.1
decorator ==5.1.1
docker ==7.0.0
exceptiongroup ==1.2.1
executing ==2.0.1
filelock ==3.14.0
fonttools ==4.53.0
fqdn ==1.5.1
fsspec ==2024.3.1
future ==1.0.0
gitdb ==4.0.11
huggingface-hub ==0.22.2
idna ==3.7
importlib_metadata ==7.1.0
importlib_resources ==6.4.0
intervaltree ==3.1.0
ipykernel ==6.29.4
ipython ==8.18.1
isoduration ==20.11.0
jedi ==0.19.1
jmespath ==1.0.1
jsonpointer ==2.4
jsonref ==1.1.0
jsonschema ==4.21.1
jsonschema-specifications ==2023.12.1
jupyter_client ==8.6.1
jupyter_core ==5.7.2
kiwisolver ==1.4.5
matplotlib ==3.9.0
matplotlib-inline ==0.1.7
monotonic ==1.6
mpmath ==1.3.0
msgpack ==1.0.8
neptune ==1.10.2
nest-asyncio ==1.6.0
networkx ==3.2.1
numpy ==1.26.4
nvidia-cublas-cu12 ==12.1.3.1
nvidia-cuda-cupti-cu12 ==12.1.105
nvidia-cuda-nvrtc-cu12 ==12.1.105
nvidia-cuda-runtime-cu12 ==12.1.105
nvidia-cudnn-cu12 ==8.9.2.26
nvidia-cufft-cu12 ==11.0.2.54
nvidia-curand-cu12 ==10.3.2.106
nvidia-cusolver-cu12 ==11.4.5.107
nvidia-cusparse-cu12 ==12.1.0.106
nvidia-nccl-cu12 ==2.20.5
nvidia-nvjitlink-cu12 ==12.4.127
nvidia-nvtx-cu12 ==12.1.105
oauthlib ==3.2.2
packaging ==24.0
pandas ==2.2.2
paramiko ==3.4.0
parso ==0.8.4
pexpect ==4.9.0
piexif ==1.1.3
pillow ==10.3.0
platformdirs ==4.2.1
plumbum ==1.8.3
prompt-toolkit ==3.0.43
psutil ==5.9.8
ptyprocess ==0.7.0
pure-eval ==0.2.2
pwn ==1.0
pwntools ==4.12.0
pycparser ==2.22
pyelftools ==0.31
pyparsing ==3.1.2
pyserial ==3.5
python-dateutil ==2.9.0.post0
python-dotenv ==1.0.1
pytz ==2024.1
pyzmq ==26.0.2
referencing ==0.35.0
regex ==2024.4.28
requests ==2.31.0
requests-oauthlib ==2.0.0
rfc3339-validator ==0.1.4
rfc3986-validator ==0.1.1
rpds-py ==0.18.0
rpyc ==6.0.0
s3transfer ==0.10.1
safetensors ==0.4.3
simplejson ==3.19.2
six ==1.16.0
smmap ==5.0.1
sortedcontainers ==2.4.0
stack-data ==0.6.3
swagger-spec-validator ==3.0.3
sympy ==1.12
tokenizers ==0.19.1
torch ==2.3.0
torchaudio ==2.3.0
torchvision ==0.18.0
tornado ==6.4
tqdm ==4.66.2
traitlets ==5.14.3
transformers ==4.40.0
triton ==2.3.0
types-python-dateutil ==2.9.0.20240316
typing_extensions ==4.11.0
tzdata ==2024.1
unicorn ==2.0.1.post1
unix-ar ==0.2.1
uri-template ==1.3.0
urllib3 ==1.26.18
wcwidth ==0.2.13
webcolors ==1.13
websocket-client ==1.8.0
zipp ==3.18.1
zstandard ==0.23.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/aielte-research/hacksynth

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing

Introduction

Using the repository

How to Cite

Contributors

License

Owner

GitHub Events

Total

Last Year

Dependencies