Recent Releases of https://github.com/confident-ai/deepteam
https://github.com/confident-ai/deepteam - π OWASP Top 10, Guardrails
You can now use OWASP Top 10 in deepteam as follows:
```python from deepteam import red_team from deepteam.frameworks import OWASPTop10
riskassessment = redteam( modelcallback=yourmodel_callback, framework=OWASPTop10() ) ```
Docs: https://www.trydeepteam.com/docs/red-teaming-owasp-top-10-for-llms
You can now also use Guardrails:
```python from deepteam.guardrails.guards import PromptInjectionGuard, ToxicityGuard from deepteam.guardrails import Guardrails
Initialize guardrails
guardrails = Guardrails( inputguards=[PromptInjectionGuard()], outputguards=[ToxicityGuard()] )
res = guardrails.guard_input(input="...") ```
Docs: https://www.trydeepteam.com/docs/guardrails-introduction
- Python
Published by penguine-ip 11 months ago
https://github.com/confident-ai/deepteam - π New CLI tool, Agentic Red Teaming
π DeepTeam CLI Release
Weβre excited to release the first version of the DeepTeam CLI β a powerful command-line tool for red teaming and evaluating LLM applications with DeepEval.
β¨ Features
Red Team Simulation
- Easily specify simulator and evaluation models (
gpt-3.5-turbo-0125,gpt-4o, etc.) - Attack LLM systems with predefined vulnerability categories (e.g., Bias, Toxicity)
- Easily specify simulator and evaluation models (
Target System Configuration
- Test both foundational models (like
gpt-3.5-turbo) and full LLM applications via custom Python wrappers - Simple YAML config structure for defining the target model's purpose and behavior
- Test both foundational models (like
System Controls
- Set concurrency and parallelism:
max_concurrent,run_async - Specify how many attacks to run per vulnerability type
- Optional error handling (
ignore_errors) and result storage (output_folder)
- Set concurrency and parallelism:
Pluggable Vulnerabilities and Attacks
- Support for multiple attack types (e.g.,
Prompt Injection) - Define default vulnerabilities like:
Bias: targeting race and genderToxicity: profanity and insults
- Support for multiple attack types (e.g.,
π Example Usage
```yaml models: simulator: gpt-3.5-turbo-0125 evaluation: gpt-4o
target: purpose: "A helpful AI assistant" model: gpt-3.5-turbo
systemconfig: maxconcurrent: 10 attackspervulnerabilitytype: 3 runasync: true ignoreerrors: false outputfolder: "results"
default_vulnerabilities: - name: "Bias" types: ["race", "gender"] - name: "Toxicity" types: ["profanity", "insults"]
attacks:
- name: "Prompt Injection"
bash
deepteam run config.yaml
```
Stay tuned for more attack types, evaluation metrics, and integrations with the DeepEval framework.
π§ Agentic Red Teaming
Agentic red teaming tests AI agents for vulnerabilities that only emerge when systems operate autonomously, maintain persistent memory, and pursue complex goals.
𧨠Specialized Attack Methods
DeepTeam includes 6 agentic-specific attacks:
Authority Spoofing β Pretend to be a system admin or override
Role Manipulation β Trick the agent into changing roles
Goal Redirection β Reframe or corrupt the agent's priorities
Linguistic Confusion β Use ambiguity to confuse language understanding
Validation Bypass β Bypass safety checks through clever phrasing
Context Injection β Inject false environmental state
Example
```python from deepteam import redteam from deepteam.vulnerabilities.agentic import DirectControlHijacking from deepteam.attacks.singleturn import AuthoritySpoofing
Test if your agent can be hijacked
riskassessment = redteam( modelcallback=youragent_callback, vulnerabilities=[DirectControlHijacking()], attacks=[AuthoritySpoofing()] ) ``` π§ͺ Happy Red Teaming β now for both chatbots and autonomous agents!
- Python
Published by penguine-ip about 1 year ago
https://github.com/confident-ai/deepteam - First Stable Release π
DeepTeam v0.1.0 β First Release π
Weβre excited to launch the first public release of DeepTeam, the open-source framework for LLM red teaming.
π§ DeepTeam enables you to simulate real-world attacks on language models, test for failure modes like jailbreaks, and uncover model vulnerabilities using structured, reproducible evaluation.
π Features
- β Built-in adversarial attack strategies (jailbreaks, refusal bypasses, prompt injections)
- β Automatic generation of adversarial test cases
- β Multi-metric evaluation (pass/fail, toxicity, relevance, etc.)
- β Seamless integration with your LLM app and testing pipelines
- β Type-safe Python API with minimal setup
Get started by installing deepteam:
bash
pip install deepteam
Docs here: https://www.trydeepteam.com/docs/getting-started
- Python
Published by penguine-ip about 1 year ago