https://github.com/aielte-research/hacksynth
LLM Agent and Evaluation Framework for Autonomous Penetration Testing
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.5%) to scientific vocabulary
Keywords
Repository
LLM Agent and Evaluation Framework for Autonomous Penetration Testing
Basic Info
Statistics
- Stars: 31
- Watchers: 1
- Forks: 7
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing
The paper can be found on arXiv.
Introduction

We introduce HackSynth, a novel Large Language Model (LLM)-based agent capable of autonomous penetration testing. HackSynth's dual-module architecture includes a Planner and a Summarizer, which enable it to generate commands and process feedback iteratively. To benchmark HackSynth, we propose two new Capture The Flag (CTF)-based benchmark sets utilizing the popular platforms PicoCTF and OverTheWire. These benchmarks include two hundred challenges across diverse domains and difficulties, providing a standardized framework for evaluating LLM-based penetration testing agents.
Using the repository
- You will have to create a Hugging Face and a Neptune.ai account
- Copy your API keys to the
.envfile, and set the desired CUDA devices, based on the.env_example - Set up the PicoCTF benchmark
- Set up the OverTheWire benchmark
- Start the HackSynth Agent
- Install the environment:
python -m venv cyber_venv source cyber_venv/bin/activate pip install -r requirements.txt - Start the benchmark with the following:
python run_bench.py -b benchmark.json -c config.jsonThebenchmark.jsonshould be one of the generatedbenchmark_solved.jsonfiles, or an equivalently structured file. The configuration files used by us for the measurements in the paper are also available in the configs folder.
- Install the environment:
How to Cite
If you use this code in your work or research, please cite the corresponding paper:
bibtex
@misc{muzsai2024hacksynthllmagentevaluation,
title={HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing},
author={Lajos Muzsai and David Imolai and András Lukács},
year={2024},
eprint={2412.01778},
archivePrefix={arXiv},
primaryClass={cs.CR},
url={https://arxiv.org/abs/2412.01778},
}
Contributors
- Lajos Muzsai (muzsailajos@protonmail.com)
- David Imolai (david@imol.ai)
- András Lukács (andras.lukacs@ttk.elte.hu)
🔍 Also see our related project on reinforcement learning for cryptographic CTFs: HackSynth-GRPO
License
The project uses the GNU AGPLv3 license.
Owner
- Name: aielte-research
- Login: aielte-research
- Kind: organization
- Repositories: 1
- Profile: https://github.com/aielte-research
GitHub Events
Total
- Watch event: 167
- Push event: 3
- Public event: 1
- Fork event: 28
Last Year
- Watch event: 167
- Push event: 3
- Public event: 1
- Fork event: 28
Dependencies
- ubuntu latest build
- GitPython ==3.1.43
- Jinja2 ==3.1.3
- Mako ==1.3.5
- MarkupSafe ==2.1.5
- PyExifTool ==0.5.6
- PyJWT ==2.8.0
- PyMuPDF ==1.24.7
- PyMuPDFb ==1.24.6
- PyNaCl ==1.5.0
- PySocks ==1.7.1
- Pygments ==2.17.2
- ROPGadget ==7.4
- accelerate ==0.29.3
- arrow ==1.3.0
- asttokens ==2.4.1
- attrs ==23.2.0
- bcrypt ==4.2.0
- boto3 ==1.34.94
- botocore ==1.34.94
- bravado ==11.0.3
- bravado-core ==6.1.1
- capstone ==5.0.1
- certifi ==2024.2.2
- cffi ==1.16.0
- charset-normalizer ==3.3.2
- click ==8.1.7
- colored-traceback ==0.4.2
- comm ==0.2.2
- contourpy ==1.2.1
- cryptography ==43.0.0
- cycler ==0.12.1
- debugpy ==1.8.1
- decorator ==5.1.1
- docker ==7.0.0
- exceptiongroup ==1.2.1
- executing ==2.0.1
- filelock ==3.14.0
- fonttools ==4.53.0
- fqdn ==1.5.1
- fsspec ==2024.3.1
- future ==1.0.0
- gitdb ==4.0.11
- huggingface-hub ==0.22.2
- idna ==3.7
- importlib_metadata ==7.1.0
- importlib_resources ==6.4.0
- intervaltree ==3.1.0
- ipykernel ==6.29.4
- ipython ==8.18.1
- isoduration ==20.11.0
- jedi ==0.19.1
- jmespath ==1.0.1
- jsonpointer ==2.4
- jsonref ==1.1.0
- jsonschema ==4.21.1
- jsonschema-specifications ==2023.12.1
- jupyter_client ==8.6.1
- jupyter_core ==5.7.2
- kiwisolver ==1.4.5
- matplotlib ==3.9.0
- matplotlib-inline ==0.1.7
- monotonic ==1.6
- mpmath ==1.3.0
- msgpack ==1.0.8
- neptune ==1.10.2
- nest-asyncio ==1.6.0
- networkx ==3.2.1
- numpy ==1.26.4
- nvidia-cublas-cu12 ==12.1.3.1
- nvidia-cuda-cupti-cu12 ==12.1.105
- nvidia-cuda-nvrtc-cu12 ==12.1.105
- nvidia-cuda-runtime-cu12 ==12.1.105
- nvidia-cudnn-cu12 ==8.9.2.26
- nvidia-cufft-cu12 ==11.0.2.54
- nvidia-curand-cu12 ==10.3.2.106
- nvidia-cusolver-cu12 ==11.4.5.107
- nvidia-cusparse-cu12 ==12.1.0.106
- nvidia-nccl-cu12 ==2.20.5
- nvidia-nvjitlink-cu12 ==12.4.127
- nvidia-nvtx-cu12 ==12.1.105
- oauthlib ==3.2.2
- packaging ==24.0
- pandas ==2.2.2
- paramiko ==3.4.0
- parso ==0.8.4
- pexpect ==4.9.0
- piexif ==1.1.3
- pillow ==10.3.0
- platformdirs ==4.2.1
- plumbum ==1.8.3
- prompt-toolkit ==3.0.43
- psutil ==5.9.8
- ptyprocess ==0.7.0
- pure-eval ==0.2.2
- pwn ==1.0
- pwntools ==4.12.0
- pycparser ==2.22
- pyelftools ==0.31
- pyparsing ==3.1.2
- pyserial ==3.5
- python-dateutil ==2.9.0.post0
- python-dotenv ==1.0.1
- pytz ==2024.1
- pyzmq ==26.0.2
- referencing ==0.35.0
- regex ==2024.4.28
- requests ==2.31.0
- requests-oauthlib ==2.0.0
- rfc3339-validator ==0.1.4
- rfc3986-validator ==0.1.1
- rpds-py ==0.18.0
- rpyc ==6.0.0
- s3transfer ==0.10.1
- safetensors ==0.4.3
- simplejson ==3.19.2
- six ==1.16.0
- smmap ==5.0.1
- sortedcontainers ==2.4.0
- stack-data ==0.6.3
- swagger-spec-validator ==3.0.3
- sympy ==1.12
- tokenizers ==0.19.1
- torch ==2.3.0
- torchaudio ==2.3.0
- torchvision ==0.18.0
- tornado ==6.4
- tqdm ==4.66.2
- traitlets ==5.14.3
- transformers ==4.40.0
- triton ==2.3.0
- types-python-dateutil ==2.9.0.20240316
- typing_extensions ==4.11.0
- tzdata ==2024.1
- unicorn ==2.0.1.post1
- unix-ar ==0.2.1
- uri-template ==1.3.0
- urllib3 ==1.26.18
- wcwidth ==0.2.13
- webcolors ==1.13
- websocket-client ==1.8.0
- zipp ==3.18.1
- zstandard ==0.23.0