https://github.com/amenra/guardbench

A Python library for guardrail models evaluation.

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.8%) to scientific vocabulary

Keywords

ai-safety benchmark evaluation guardrail-models guardrails llm

Last synced: 5 months ago · JSON representation

Repository

A Python library for guardrail models evaluation.

Basic Info

Host: GitHub
Owner: AmenRa
License: eupl-1.2
Language: Python
Default Branch: main
Homepage:
Size: 109 KB

Statistics

Stars: 9
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Topics

ai-safety benchmark evaluation guardrail-models guardrails llm

Created over 1 year ago · Last pushed 11 months ago

Metadata Files

Readme License

GuardBench

⚡️ Introduction

GuardBench is a Python library for the evaluation of guardrail models, i.e., LLMs fine-tuned to detect unsafe content in human-AI interactions. GuardBench provides a common interface to 40 evaluation datasets, which are downloaded and converted into a standardized format for improved usability. It also allows to quickly compare results and export LaTeX tables for scientific publications. GuardBench's benchmarking pipeline can also be leveraged on custom datasets.

GuardBench was featured in EMNLP 2024. The related paper is available here.

GuardBench has a public leaderboard available on HuggingFace.

You can find the list of supported datasets here. A few of them requires authorization. Please, read this.

If you use GuardBench to evaluate guardrail models for your scientific publications, please consider citing our work.

✨ Features

40 datasets for guardrail models evaluation.
Automated evaluation pipeline.
User-friendly.
Extendable.
Reproducible and sharable evaluation.
Exportable evaluation reports.

🔌 Requirements

bash python>=3.10

💾 Installation

bash pip install guardbench

💡 Usage

```python from guardbench import benchmark

def moderate( conversations: list[list[dict[str, str]]], # MANDATORY! # additional kwargs as needed ) -> list[float]: # do moderation # return list of floats (unsafe probabilities)

benchmark( moderate=moderate, # User-defined moderation function modelname="My Guardrail Model", batchsize=32, datasets="all", # Note: you can pass additional kwargs for moderate ) ```

📖 Examples

Follow our tutorial on benchmarking Llama Guard with GuardBench.
More examples are available in the scripts folder.

📚 Documentation

Browse the documentation for more details about: - The datasets and how to obtain them. - The data format used by GuardBench. - How to use the Report class to compare models and export results as LaTeX tables. - How to leverage GuardBench's benchmarking pipeline on custom datasets.

🏆 Leaderboard

You can find GuardBench's leaderboard here. If you want to submit your results, please contact us.

👨‍💻 Authors

Elias Bassani (European Commission - Joint Research Centre)

🎓 Citation

bibtex @inproceedings{guardbench, title = "{G}uard{B}ench: A Large-Scale Benchmark for Guardrail Models", author = "Bassani, Elias and Sanchez, Ignacio", editor = "Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung", booktitle = "Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing", month = nov, year = "2024", address = "Miami, Florida, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.emnlp-main.1022", doi = "10.18653/v1/2024.emnlp-main.1022", pages = "18393--18409", }

🎁 Feature Requests

Would you like to see other features implemented? Please, open a feature request.

📄 License

GuardBench is provided as open-source software licensed under EUPL v1.2.

Owner

Name: Elias Bassani
Login: AmenRa
Kind: user
Location: Milan, Italy
Company: Joint Research Centre

Website: amenra.github.io/eliasbassani
Twitter: elias_bssn
Repositories: 28
Profile: https://github.com/AmenRa

Ph.D. in CS. I like Neural Networks, usability, efficiency, einsum, memes, and improperly used emojis. 🫠

GitHub Events

Total

Watch event: 15
Push event: 10
Pull request event: 1

Last Year

Watch event: 15
Push event: 10
Pull request event: 1

Packages

Total packages: 1
Total downloads:
- pypi 30 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 2
Total maintainers: 1

pypi.org: guardbench

GuardBench: A Large-Scale Benchmark for Guardrail Models

Documentation: https://guardbench.readthedocs.io/
License: MIT License
Latest release: 1.0.0
published over 1 year ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 30 Last month

Rankings

Dependent packages count: 10.5%

Average: 35.0%

Dependent repos count: 59.4%

Maintainers (1)

AmenRa

Last synced: 6 months ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/amenra/guardbench

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

GuardBench

⚡️ Introduction

✨ Features

🔌 Requirements

💾 Installation

💡 Usage

📖 Examples

📚 Documentation

🏆 Leaderboard

👨‍💻 Authors

🎓 Citation

🎁 Feature Requests

📄 License

Owner

GitHub Events

Total

Last Year

Packages

pypi.org: guardbench

Rankings

Maintainers (1)