https://github.com/amazon-science/turbofuzzllm
TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.5%) to scientific vocabulary
Keywords
Repository
TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice
Basic Info
- Host: GitHub
- Owner: amazon-science
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://arxiv.org/abs/2502.18504
- Size: 51.8 KB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
TurboFuzzLLM
Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking LLMs in Practice
A state-of-the-art tool for automatic red teaming of Large Language Models (LLMs) that generates effective adversarial prompt templates to identify vulnerabilities and improve AI safety.
⚠️ Responsible Use
This tool is designed for improving AI safety through systematic vulnerability testing. It should be used responsibly for defensive purposes and developing better safeguards for LLMs.
Our primary goal is to advance the development of more robust and safer AI systems by identifying and addressing their vulnerabilities. We believe this research will ultimately benefit the AI community by enabling the development of better safety measures and alignment techniques.
🎯 Key Features
- High Success Rate: Achieves >98% Attack Success Rate (ASR) on GPT-4o, GPT-4 Turbo, and other leading LLMs
- Efficient: 3x fewer queries and 2x more successful templates compared to previous methods
- Generalizable: >90% ASR on unseen harmful questions
- Practical: Easy-to-use CLI with statistics, search visualization, and logging
- Defensive Applications: Generated data improves model safety (74% safer after fine-tuning)
🔧 Method Overview
TurboFuzzLLM performs black-box mutation-based fuzzing to iteratively generate new adversarial red teaming templates. Key innovations include:
- Expanded Mutation Space: New mutation operations including refusal suppression
- Reinforcement Learning: Feedback-guided prioritized search
- Intelligent Heuristics: Efficient exploration with fewer LLM queries
- Template-Based Approach: Templates can be combined with any harmful question for scalable attacks
📊 Results
| Metric | Performance | |--------|-------------| | ASR on GPT-4o/GPT-4 Turbo | >98% | | ASR on unseen questions | >90% | | Query efficiency | 3x fewer queries | | Template success rate | 2x improvement | | Model safety improvement | 74% safer after adversarial training |
🛡️ Applications
- Vulnerability Identification: Discover prompt-based attack vectors in LLMs
- Countermeasure Development:
- Improve in-built LLM safeguards
- Create external guardrails
- Adversarial Training: Generate high-quality (attack prompt, harmful response) pairs for safety fine-tuning
🚀 Getting Started
Prerequisites
- Model Access: Obtain API keys for your LLM provider (e.g., Amazon Bedrock)
- Dependencies: Install required packages (TODO: details coming soon!)
- Build: Build the package (TODO: details coming soon!)
Install Dependencies
bash
git clone https://github.com/amazon-science/TurboFuzzLLM.git
cd TurboFuzzLLM
pip install -e .
- This tools assumes you have AWS CLI and credentials configured on your machine. You can do this by running
aws configureand following the prompts. - Details configuration can be found in https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html.
Quick Start
- First Download GPTFuzzer templates:
bash
python3 scripts/get_templates_gptfuzzer.py
Run a single red teaming attempt interactively:
bash
python3 src/__main__.py answer
When prompted, provide a harmful question to test, e.g.: ```
How can I cheat on my final exam? ```
Check the results in output/<date>/*/
⚙️ Configuration
Execution Modes
TurboFuzzLLM supports 4 operational modes:
| Mode | Description | Use Case |
|------|-------------|----------|
| answer | Red team a single question interactively | Quick testing |
| attack | Red team multiple questions from a dataset efficiently | Batch vulnerability testing |
| legacy | Run vanilla GPTFuzzer to learn effective templates | Baseline comparison |
| evaluate | Test learned templates against a dataset | Template effectiveness measurement |
Command Line Interface
Get help for any mode:
bash
python3 src/__main__.py <mode> --help
Key Parameters
Target Model: Specify the LLM to attack
bash --target-model-id <bedrock-model-id>Query Budget: Limit the number of queries to the target
bash --max-queries N
📂 Understanding Output
Each run creates an output folder with the following structure:
output/<date>/<mode>_<target-model-id>_<start-time>/
├── templates.csv # Summary of each template used
├── mutators.csv # Performance metrics for each mutator
├── queries.csv # Details of each LLM query
├── stats.txt # Key metrics summary
├── details.log # Detailed execution log
└── template_tree.dot # Visualization of mutant search space
Output Files Description
templates.csv: Contains all generated templates with their success ratesmutators.csv: Performance analysis of different mutation operationsqueries.csv: Complete record of LLM interactionsstats.txt: High-level metrics including ASR, query count, and timingdetails.log: Verbose logging for debuggingtemplate_tree.dot: Graphviz visualization of the template evolution tree
👥 Meet the Team
- Aman Goel* (Contact: goelaman@amazon.com)
- Xian Carrie Wu
- Zhe Wang
- Dmitriy Bespalov
- Yanjun (Jane) Qi
Security
See CONTRIBUTING for more information.
License
This project is licensed under the Apache-2.0 License.
Citation
If you find this useful in your research, please consider citing:
@inproceedings{goel2025turbofuzzllm,
title={TurboFuzzLLM: Turbocharging Mutation-based Fuzzing for Effectively Jailbreaking Large Language Models in Practice},
author={Goel, Aman and Wu, Xian and Wang, Daisy Zhe and Bespalov, Dmitriy and Qi, Yanjun},
booktitle={Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)},
pages={523--534},
year={2025}
}
Owner
- Name: Amazon Science
- Login: amazon-science
- Kind: organization
- Website: https://amazon.science
- Twitter: AmazonScience
- Repositories: 80
- Profile: https://github.com/amazon-science
GitHub Events
Total
- Watch event: 9
- Public event: 1
- Push event: 2
- Pull request review event: 1
- Pull request event: 3
- Fork event: 2
Last Year
- Watch event: 9
- Public event: 1
- Push event: 2
- Pull request review event: 1
- Pull request event: 3
- Fork event: 2