https://github.com/aielte-research/hacksynth

LLM Agent and Evaluation Framework for Autonomous Penetration Testing

https://github.com/aielte-research/hacksynth

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary

Keywords

ai autonomous-pentesting ctf ctf-tools cybersecurity llms penetration-testing
Last synced: 5 months ago · JSON representation

Repository

LLM Agent and Evaluation Framework for Autonomous Penetration Testing

Basic Info
  • Host: GitHub
  • Owner: aielte-research
  • License: agpl-3.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.91 MB
Statistics
  • Stars: 31
  • Watchers: 1
  • Forks: 7
  • Open Issues: 1
  • Releases: 0
Topics
ai autonomous-pentesting ctf ctf-tools cybersecurity llms penetration-testing
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License

README.md

HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing

The paper can be found on arXiv.

Introduction

HackSynth Logo

We introduce HackSynth, a novel Large Language Model (LLM)-based agent capable of autonomous penetration testing. HackSynth's dual-module architecture includes a Planner and a Summarizer, which enable it to generate commands and process feedback iteratively. To benchmark HackSynth, we propose two new Capture The Flag (CTF)-based benchmark sets utilizing the popular platforms PicoCTF and OverTheWire. These benchmarks include two hundred challenges across diverse domains and difficulties, providing a standardized framework for evaluating LLM-based penetration testing agents.


Using the repository

  • You will have to create a Hugging Face and a Neptune.ai account
  • Copy your API keys to the .env file, and set the desired CUDA devices, based on the .env_example
  • Set up the PicoCTF benchmark
  • Set up the OverTheWire benchmark
  • Start the HackSynth Agent
    • Install the environment: python -m venv cyber_venv source cyber_venv/bin/activate pip install -r requirements.txt
    • Start the benchmark with the following: python run_bench.py -b benchmark.json -c config.json The benchmark.json should be one of the generated benchmark_solved.json files, or an equivalently structured file. The configuration files used by us for the measurements in the paper are also available in the configs folder.

How to Cite

If you use this code in your work or research, please cite the corresponding paper: bibtex @misc{muzsai2024hacksynthllmagentevaluation, title={HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing}, author={Lajos Muzsai and David Imolai and András Lukács}, year={2024}, eprint={2412.01778}, archivePrefix={arXiv}, primaryClass={cs.CR}, url={https://arxiv.org/abs/2412.01778}, }

Contributors

  • Lajos Muzsai (muzsailajos@protonmail.com)
  • David Imolai (david@imol.ai)
  • András Lukács (andras.lukacs@ttk.elte.hu)

🔍 Also see our related project on reinforcement learning for cryptographic CTFs: HackSynth-GRPO

License

The project uses the GNU AGPLv3 license.

Owner

  • Name: aielte-research
  • Login: aielte-research
  • Kind: organization

GitHub Events

Total
  • Watch event: 167
  • Push event: 3
  • Public event: 1
  • Fork event: 28
Last Year
  • Watch event: 167
  • Push event: 3
  • Public event: 1
  • Fork event: 28

Dependencies

picoctf_bench/Dockerfile docker
  • ubuntu latest build
requirements.txt pypi
  • GitPython ==3.1.43
  • Jinja2 ==3.1.3
  • Mako ==1.3.5
  • MarkupSafe ==2.1.5
  • PyExifTool ==0.5.6
  • PyJWT ==2.8.0
  • PyMuPDF ==1.24.7
  • PyMuPDFb ==1.24.6
  • PyNaCl ==1.5.0
  • PySocks ==1.7.1
  • Pygments ==2.17.2
  • ROPGadget ==7.4
  • accelerate ==0.29.3
  • arrow ==1.3.0
  • asttokens ==2.4.1
  • attrs ==23.2.0
  • bcrypt ==4.2.0
  • boto3 ==1.34.94
  • botocore ==1.34.94
  • bravado ==11.0.3
  • bravado-core ==6.1.1
  • capstone ==5.0.1
  • certifi ==2024.2.2
  • cffi ==1.16.0
  • charset-normalizer ==3.3.2
  • click ==8.1.7
  • colored-traceback ==0.4.2
  • comm ==0.2.2
  • contourpy ==1.2.1
  • cryptography ==43.0.0
  • cycler ==0.12.1
  • debugpy ==1.8.1
  • decorator ==5.1.1
  • docker ==7.0.0
  • exceptiongroup ==1.2.1
  • executing ==2.0.1
  • filelock ==3.14.0
  • fonttools ==4.53.0
  • fqdn ==1.5.1
  • fsspec ==2024.3.1
  • future ==1.0.0
  • gitdb ==4.0.11
  • huggingface-hub ==0.22.2
  • idna ==3.7
  • importlib_metadata ==7.1.0
  • importlib_resources ==6.4.0
  • intervaltree ==3.1.0
  • ipykernel ==6.29.4
  • ipython ==8.18.1
  • isoduration ==20.11.0
  • jedi ==0.19.1
  • jmespath ==1.0.1
  • jsonpointer ==2.4
  • jsonref ==1.1.0
  • jsonschema ==4.21.1
  • jsonschema-specifications ==2023.12.1
  • jupyter_client ==8.6.1
  • jupyter_core ==5.7.2
  • kiwisolver ==1.4.5
  • matplotlib ==3.9.0
  • matplotlib-inline ==0.1.7
  • monotonic ==1.6
  • mpmath ==1.3.0
  • msgpack ==1.0.8
  • neptune ==1.10.2
  • nest-asyncio ==1.6.0
  • networkx ==3.2.1
  • numpy ==1.26.4
  • nvidia-cublas-cu12 ==12.1.3.1
  • nvidia-cuda-cupti-cu12 ==12.1.105
  • nvidia-cuda-nvrtc-cu12 ==12.1.105
  • nvidia-cuda-runtime-cu12 ==12.1.105
  • nvidia-cudnn-cu12 ==8.9.2.26
  • nvidia-cufft-cu12 ==11.0.2.54
  • nvidia-curand-cu12 ==10.3.2.106
  • nvidia-cusolver-cu12 ==11.4.5.107
  • nvidia-cusparse-cu12 ==12.1.0.106
  • nvidia-nccl-cu12 ==2.20.5
  • nvidia-nvjitlink-cu12 ==12.4.127
  • nvidia-nvtx-cu12 ==12.1.105
  • oauthlib ==3.2.2
  • packaging ==24.0
  • pandas ==2.2.2
  • paramiko ==3.4.0
  • parso ==0.8.4
  • pexpect ==4.9.0
  • piexif ==1.1.3
  • pillow ==10.3.0
  • platformdirs ==4.2.1
  • plumbum ==1.8.3
  • prompt-toolkit ==3.0.43
  • psutil ==5.9.8
  • ptyprocess ==0.7.0
  • pure-eval ==0.2.2
  • pwn ==1.0
  • pwntools ==4.12.0
  • pycparser ==2.22
  • pyelftools ==0.31
  • pyparsing ==3.1.2
  • pyserial ==3.5
  • python-dateutil ==2.9.0.post0
  • python-dotenv ==1.0.1
  • pytz ==2024.1
  • pyzmq ==26.0.2
  • referencing ==0.35.0
  • regex ==2024.4.28
  • requests ==2.31.0
  • requests-oauthlib ==2.0.0
  • rfc3339-validator ==0.1.4
  • rfc3986-validator ==0.1.1
  • rpds-py ==0.18.0
  • rpyc ==6.0.0
  • s3transfer ==0.10.1
  • safetensors ==0.4.3
  • simplejson ==3.19.2
  • six ==1.16.0
  • smmap ==5.0.1
  • sortedcontainers ==2.4.0
  • stack-data ==0.6.3
  • swagger-spec-validator ==3.0.3
  • sympy ==1.12
  • tokenizers ==0.19.1
  • torch ==2.3.0
  • torchaudio ==2.3.0
  • torchvision ==0.18.0
  • tornado ==6.4
  • tqdm ==4.66.2
  • traitlets ==5.14.3
  • transformers ==4.40.0
  • triton ==2.3.0
  • types-python-dateutil ==2.9.0.20240316
  • typing_extensions ==4.11.0
  • tzdata ==2024.1
  • unicorn ==2.0.1.post1
  • unix-ar ==0.2.1
  • uri-template ==1.3.0
  • urllib3 ==1.26.18
  • wcwidth ==0.2.13
  • webcolors ==1.13
  • websocket-client ==1.8.0
  • zipp ==3.18.1
  • zstandard ==0.23.0