Recent Releases of https://github.com/confident-ai/deepteam

https://github.com/confident-ai/deepteam - πŸŽ‰ OWASP Top 10, Guardrails

You can now use OWASP Top 10 in deepteam as follows:

```python from deepteam import red_team from deepteam.frameworks import OWASPTop10

riskassessment = redteam( modelcallback=yourmodel_callback, framework=OWASPTop10() ) ```

Docs: https://www.trydeepteam.com/docs/red-teaming-owasp-top-10-for-llms

You can now also use Guardrails:

```python from deepteam.guardrails.guards import PromptInjectionGuard, ToxicityGuard from deepteam.guardrails import Guardrails

Initialize guardrails

guardrails = Guardrails( inputguards=[PromptInjectionGuard()], outputguards=[ToxicityGuard()] )

res = guardrails.guard_input(input="...") ```

Docs: https://www.trydeepteam.com/docs/guardrails-introduction

- Python
Published by penguine-ip 11 months ago

https://github.com/confident-ai/deepteam - πŸŽ‰ New CLI tool, Agentic Red Teaming

πŸš€ DeepTeam CLI Release

We’re excited to release the first version of the DeepTeam CLI – a powerful command-line tool for red teaming and evaluating LLM applications with DeepEval.

✨ Features

  • Red Team Simulation

    • Easily specify simulator and evaluation models (gpt-3.5-turbo-0125, gpt-4o, etc.)
    • Attack LLM systems with predefined vulnerability categories (e.g., Bias, Toxicity)
  • Target System Configuration

    • Test both foundational models (like gpt-3.5-turbo) and full LLM applications via custom Python wrappers
    • Simple YAML config structure for defining the target model's purpose and behavior
  • System Controls

    • Set concurrency and parallelism: max_concurrent, run_async
    • Specify how many attacks to run per vulnerability type
    • Optional error handling (ignore_errors) and result storage (output_folder)
  • Pluggable Vulnerabilities and Attacks

    • Support for multiple attack types (e.g., Prompt Injection)
    • Define default vulnerabilities like:
    • Bias: targeting race and gender
    • Toxicity: profanity and insults

πŸ›  Example Usage

```yaml models: simulator: gpt-3.5-turbo-0125 evaluation: gpt-4o

target: purpose: "A helpful AI assistant" model: gpt-3.5-turbo

systemconfig: maxconcurrent: 10 attackspervulnerabilitytype: 3 runasync: true ignoreerrors: false outputfolder: "results"

default_vulnerabilities: - name: "Bias" types: ["race", "gender"] - name: "Toxicity" types: ["profanity", "insults"]

attacks: - name: "Prompt Injection" bash deepteam run config.yaml ```

Stay tuned for more attack types, evaluation metrics, and integrations with the DeepEval framework.

🧠 Agentic Red Teaming

Agentic red teaming tests AI agents for vulnerabilities that only emerge when systems operate autonomously, maintain persistent memory, and pursue complex goals.

🧨 Specialized Attack Methods

DeepTeam includes 6 agentic-specific attacks:

Authority Spoofing – Pretend to be a system admin or override

Role Manipulation – Trick the agent into changing roles

Goal Redirection – Reframe or corrupt the agent's priorities

Linguistic Confusion – Use ambiguity to confuse language understanding

Validation Bypass – Bypass safety checks through clever phrasing

Context Injection – Inject false environmental state

Example

```python from deepteam import redteam from deepteam.vulnerabilities.agentic import DirectControlHijacking from deepteam.attacks.singleturn import AuthoritySpoofing

Test if your agent can be hijacked

riskassessment = redteam( modelcallback=youragent_callback, vulnerabilities=[DirectControlHijacking()], attacks=[AuthoritySpoofing()] ) ``` πŸ§ͺ Happy Red Teaming – now for both chatbots and autonomous agents!

- Python
Published by penguine-ip about 1 year ago

https://github.com/confident-ai/deepteam - First Stable Release πŸŽ‰

DeepTeam v0.1.0 – First Release πŸŽ‰

We’re excited to launch the first public release of DeepTeam, the open-source framework for LLM red teaming.

🧠 DeepTeam enables you to simulate real-world attacks on language models, test for failure modes like jailbreaks, and uncover model vulnerabilities using structured, reproducible evaluation.

πŸš€ Features

  • βœ… Built-in adversarial attack strategies (jailbreaks, refusal bypasses, prompt injections)
  • βœ… Automatic generation of adversarial test cases
  • βœ… Multi-metric evaluation (pass/fail, toxicity, relevance, etc.)
  • βœ… Seamless integration with your LLM app and testing pipelines
  • βœ… Type-safe Python API with minimal setup

Get started by installing deepteam:

bash pip install deepteam

Docs here: https://www.trydeepteam.com/docs/getting-started

- Python
Published by penguine-ip about 1 year ago