https://github.com/amenra/guardbench

A Python library for guardrail models evaluation.

https://github.com/amenra/guardbench

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.8%) to scientific vocabulary

Keywords

ai-safety benchmark evaluation guardrail-models guardrails llm
Last synced: 5 months ago · JSON representation

Repository

A Python library for guardrail models evaluation.

Basic Info
  • Host: GitHub
  • Owner: AmenRa
  • License: eupl-1.2
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 109 KB
Statistics
  • Stars: 9
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
ai-safety benchmark evaluation guardrail-models guardrails llm
Created over 1 year ago · Last pushed 11 months ago
Metadata Files
Readme License

README.md

PyPI version Documentation Status License: EUPL-1.2

GuardBench

⚡️ Introduction

GuardBench is a Python library for the evaluation of guardrail models, i.e., LLMs fine-tuned to detect unsafe content in human-AI interactions. GuardBench provides a common interface to 40 evaluation datasets, which are downloaded and converted into a standardized format for improved usability. It also allows to quickly compare results and export LaTeX tables for scientific publications. GuardBench's benchmarking pipeline can also be leveraged on custom datasets.

GuardBench was featured in EMNLP 2024. The related paper is available here.

GuardBench has a public leaderboard available on HuggingFace.

You can find the list of supported datasets here. A few of them requires authorization. Please, read this.

If you use GuardBench to evaluate guardrail models for your scientific publications, please consider citing our work.

✨ Features

🔌 Requirements

bash python>=3.10

💾 Installation

bash pip install guardbench

💡 Usage

```python from guardbench import benchmark

def moderate( conversations: list[list[dict[str, str]]], # MANDATORY! # additional kwargs as needed ) -> list[float]: # do moderation # return list of floats (unsafe probabilities)

benchmark( moderate=moderate, # User-defined moderation function modelname="My Guardrail Model", batchsize=32, datasets="all", # Note: you can pass additional kwargs for moderate ) ```

📖 Examples

📚 Documentation

Browse the documentation for more details about: - The datasets and how to obtain them. - The data format used by GuardBench. - How to use the Report class to compare models and export results as LaTeX tables. - How to leverage GuardBench's benchmarking pipeline on custom datasets.

🏆 Leaderboard

You can find GuardBench's leaderboard here. If you want to submit your results, please contact us. <!-- All results can be reproduced using the provided scripts. -->

👨‍💻 Authors

  • Elias Bassani (European Commission - Joint Research Centre)

🎓 Citation

bibtex @inproceedings{guardbench, title = "{G}uard{B}ench: A Large-Scale Benchmark for Guardrail Models", author = "Bassani, Elias and Sanchez, Ignacio", editor = "Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung", booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2024", address = "Miami, Florida, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.emnlp-main.1022", doi = "10.18653/v1/2024.emnlp-main.1022", pages = "18393--18409", }

🎁 Feature Requests

Would you like to see other features implemented? Please, open a feature request.

📄 License

GuardBench is provided as open-source software licensed under EUPL v1.2.

Owner

  • Name: Elias Bassani
  • Login: AmenRa
  • Kind: user
  • Location: Milan, Italy
  • Company: Joint Research Centre

Ph.D. in CS. I like Neural Networks, usability, efficiency, einsum, memes, and improperly used emojis. 🫠

GitHub Events

Total
  • Watch event: 15
  • Push event: 10
  • Pull request event: 1
Last Year
  • Watch event: 15
  • Push event: 10
  • Pull request event: 1

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 30 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
pypi.org: guardbench

GuardBench: A Large-Scale Benchmark for Guardrail Models

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 30 Last month
Rankings
Dependent packages count: 10.5%
Average: 35.0%
Dependent repos count: 59.4%
Maintainers (1)
Last synced: 6 months ago