evaluate

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

https://github.com/huggingface/evaluate

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
3 of 129 committers (2.3%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.5%) to scientific vocabulary

Keywords

evaluation machine-learning

Keywords from Contributors

dataset-hub speech transformer jax pretrained-models vlm speech-recognition qwen pytorch-transformers model-hub

Last synced: 6 months ago · JSON representation

Repository

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

Basic Info

Host: GitHub
Owner: huggingface
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://huggingface.co/docs/evaluate
Size: 2.04 MB

Statistics

Stars: 2,308
Watchers: 43
Forks: 290
Open Issues: 246
Releases: 11

Topics

evaluation machine-learning

Created almost 4 years ago · Last pushed 6 months ago

Metadata Files

Readme Contributing License Code of conduct Authors

README.md

Tip: For more recent evaluation approaches, for example for evaluating LLMs, we recommend our newer and more actively maintained library LightEval.

🤗 Evaluate is a library that makes evaluating and comparing models and reporting their performance easier and more standardized.

It currently contains:

implementations of dozens of popular metrics: the existing metrics cover a variety of tasks spanning from NLP to Computer Vision, and include dataset-specific metrics for datasets. With a simple command like accuracy = load("accuracy"), get any of these metrics ready to use for evaluating a ML model in any framework (Numpy/Pandas/PyTorch/TensorFlow/JAX).
comparisons and measurements: comparisons are used to measure the difference between models and measurements are tools to evaluate datasets.
an easy way of adding new evaluation modules to the 🤗 Hub: you can create new evaluation modules and push them to a dedicated Space in the 🤗 Hub with evaluate-cli create [metric name], which allows you to see easily compare different metrics and their outputs for the same sets of references and predictions.

🎓 Documentation

🔎 Find a metric, comparison, measurement on the Hub

🌟 Add a new evaluation module

🤗 Evaluate also has lots of useful features like:

Type checking: the input types are checked to make sure that you are using the right input formats for each metric
Metric cards: each metrics comes with a card that describes the values, limitations and their ranges, as well as providing examples of their usage and usefulness.
Community metrics: Metrics live on the Hugging Face Hub and you can easily add your own metrics for your project or to collaborate with others.

Installation

With pip

🤗 Evaluate can be installed from PyPi and has to be installed in a virtual environment (venv or conda for instance)

bash pip install evaluate

Usage

🤗 Evaluate's main methods are:

evaluate.list_evaluation_modules() to list the available metrics, comparisons and measurements
evaluate.load(module_name, **kwargs) to instantiate an evaluation module
results = module.compute(*kwargs) to compute the result of an evaluation module

Adding a new evaluation module

First install the necessary dependencies to create a new metric with the following command: bash pip install evaluate[template] Then you can get started with the following command which will create a new folder for your metric and display the necessary steps: bash evaluate-cli create "Awesome Metric" See this step-by-step guide in the documentation for detailed instructions.

Credits

Thanks to @marella for letting us use the evaluate namespace on PyPi previously used by his library.

Owner

Name: Hugging Face
Login: huggingface
Kind: organization
Location: NYC + Paris

Website: https://huggingface.co/
Twitter: huggingface
Repositories: 344
Profile: https://github.com/huggingface

The AI community building the future.

GitHub Events

Total

Create event: 5
Release event: 1
Issues event: 31
Watch event: 281
Delete event: 6
Member event: 1
Issue comment event: 49
Push event: 13
Pull request review comment event: 5
Pull request event: 20
Pull request review event: 13
Fork event: 37

Last Year

Create event: 5
Release event: 1
Issues event: 31
Watch event: 281
Delete event: 6
Member event: 1
Issue comment event: 49
Push event: 13
Pull request review comment event: 5
Pull request event: 20
Pull request review event: 13
Fork event: 37

Committers

Last synced: 9 months ago

All Time

Total Commits: 923
Total Committers: 129
Avg Commits per committer: 7.155
Development Distribution Score (DDS): 0.782

Past Year

Commits: 11
Committers: 5
Avg Commits per committer: 2.2
Development Distribution Score (DDS): 0.545

Top Committers

Name	Email	Commits
Quentin Lhoest	4****q	201
Albert Villanova del Moral	8****a	116
Sasha Luccioni	l**s@m**c	103
Leandro von Werra	l****a	99
Mario Šaško	m**7@g**m	50
Thomas Wolf	t****f	29
sashavor	a**a@g**m	24
leandro	l**a@s**o	24
sashavor	s**i@h**o	23
helen	3****n	16
Bram Vanroy	B**y@U**e	13
lewtun	l**l@g**m	13
Patrick von Platen	p**n@g**m	12
mathemakitten	h**n@h**o	9
fxmarty	9****y	9
Steven Liu	5****u	6
Sylvain Lesage	s**o@r**t	6
emibaylor	2****r	6
meg	9****e	6
Simon Brandeis	3****s	6
Mishig	d**g@g**m	4
douwekiela	d****a	4
Yacine Jernite	y****e	4
Sylvain Gugger	3****r	4
Philipp Schmid	3****d	4
Julien Plu	p**n@g**m	4
Steven	s**u@g**m	4
Nima Boscarino	n**o@g**m	3
Ricardo Rei	r**i@u**m	3
Lysandre Debut	l**e@h**o	2
and 99 more...

Committer Domains (Top 20 + Academic)

huggingface.co: 4 mila.quebec: 1 spoud.io: 1 ugent.be: 1 rednegra.net: 1 unbabel.com: 1 google.com: 1 coloradocollege.edu: 1 hey.com: 1 asus.com: 1 polytechnique.edu: 1 g.rit.edu: 1 may.la: 1 mail.de: 1 allenai.org: 1 ymail.com: 1 ulaval.ca: 1 elementai.com: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 200
Total pull requests: 178
Average time to close issues: 2 months
Average time to close pull requests: about 1 month
Total issue authors: 169
Total pull request authors: 91
Average comments per issue: 1.92
Average comments per pull request: 0.95
Merged pull requests: 58
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 38
Pull requests: 50
Average time to close issues: 2 days
Average time to close pull requests: 2 days
Issue authors: 33
Pull request authors: 16
Average comments per issue: 0.21
Average comments per pull request: 0.26
Merged pull requests: 19
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

albertvillanova (7)
lvwerra (7)
daskol (4)
FlorinAndrei (3)
shivanraptor (3)
lewtun (2)
jpodivin (2)
trajepl (2)
BramVanroy (2)
lowlypalace (2)
AndreaSottana (2)
boyleconnor (2)
mathemakitten (2)
NightMachinery (2)
NielsRogge (2)

Pull Request Authors

lhoestq (21)
albertvillanova (11)
MedAhmedKrichen (5)
shenxiangzhuang (4)
qubvel (4)
krishnap25 (4)
jpodivin (4)
skyil7 (4)
nikvaessen (3)
lvwerra (3)
hazrulakmal (3)
Wauplin (3)
tybrs (2)
tupini07 (2)
milistu (2)

Top Labels

Issue Labels

metric request (9) enhancement (2)

Pull Request Labels

Packages

Total packages: 4
Total downloads:
- pypi 3,631,542 last-month
Total docker downloads: 24,936,750

Total dependent packages: 222
(may contain duplicates)
Total dependent repositories: 2,480
(may contain duplicates)
Total versions: 35
Total maintainers: 3

pypi.org: evaluate

HuggingFace community-driven open-source library of evaluation

Homepage: https://github.com/huggingface/evaluate
Documentation: https://evaluate.readthedocs.io/
License: Apache 2.0
Latest release: 0.4.5
published 8 months ago

Versions: 18
Dependent Packages: 222
Dependent Repositories: 2,474
Downloads: 3,631,542 Last month
Docker Downloads: 24,936,750

Rankings

Dependent packages count: 0.2%

Dependent repos count: 0.2%

Downloads: 0.3%

Docker downloads count: 1.0%

Average: 1.2%

Stargazers count: 1.7%

Forks count: 3.8%

Maintainers (3)

lysandre lhoestq lvwerra

Last synced: 6 months ago

proxy.golang.org: github.com/huggingface/evaluate

Documentation: https://pkg.go.dev/github.com/huggingface/evaluate#section-documentation
License: apache-2.0
Latest release: v0.4.5
published 8 months ago

Versions: 13
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Stargazers count: 1.7%

Forks count: 2.0%

Average: 6.0%

Dependent packages count: 9.6%

Dependent repos count: 10.8%

Last synced: 6 months ago

conda-forge.org: evaluate

Homepage: https://github.com/huggingface/evaluate
License: Apache-2.0
Latest release: 0.2.2
published over 3 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 3

Rankings

Stargazers count: 11.5%

Forks count: 16.4%

Dependent repos count: 18.0%

Average: 24.4%

Dependent packages count: 51.6%

Last synced: 6 months ago

anaconda.org: evaluate

Evaluate is a library that makes evaluating and comparing models and reporting their performance easier and more standardized. It currently contains: - implementations of dozens of popular metrics: the existing metrics cover a variety of tasks spanning from NLP to Computer Vision, and include dataset-specific metrics for datasets. With a simple command like `accuracy = load("accuracy")`, get any of these metrics ready to use for evaluating a ML model in any framework (Numpy/Pandas/PyTorch/TensorFlow/JAX). - comparisons and measurements: comparisons are used to measure the difference between models and measurements are tools to evaluate datasets. - an easy way of adding new evaluation modules to the 🤗 Hub: you can create new evaluation modules and push them to a dedicated Space in the 🤗 Hub with evaluate-cli create [metric name], which allows you to see easily compare different metrics and their outputs for the same sets of references and predictions.

Homepage: https://github.com/huggingface/evaluate
License: Apache-2.0
Latest release: 0.4.3
published 8 months ago

Versions: 3
Dependent Packages: 0
Dependent Repositories: 3

Rankings

Stargazers count: 21.4%

Forks count: 29.1%

Average: 37.0%

Dependent repos count: 46.4%

Dependent packages count: 51.1%

Last synced: 6 months ago

Dependencies

.github/workflows/ci.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/python-release.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/update_spaces.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/hub/requirements.txt pypi

huggingface_hub *

additional-tests-requirements.txt pypi

gin-config * test
unbabel-comet >=1.0.0 test

comparisons/exact_match/requirements.txt pypi

scipy *

comparisons/mcnemar/requirements.txt pypi

scipy *

comparisons/wilcoxon/requirements.txt pypi

datasets *
scipy *

measurements/honest/requirements.txt pypi

torch *
transformers *
unidecode ==1.3.4

measurements/label_distribution/requirements.txt pypi

scipy *

measurements/perplexity/requirements.txt pypi

torch *
transformers *

measurements/regard/requirements.txt pypi

torch *
transformers *

measurements/toxicity/requirements.txt pypi

torch *
transformers *

measurements/word_count/requirements.txt pypi

scikit-learn *

measurements/word_length/requirements.txt pypi

nltk *

metrics/accuracy/requirements.txt pypi

scikit-learn *

metrics/bertscore/requirements.txt pypi

bert_score *

metrics/brier_score/requirements.txt pypi

scikit-learn *

metrics/cer/requirements.txt pypi

jiwer *

metrics/character/requirements.txt pypi

cer >=1.2.0

metrics/charcut_mt/requirements.txt pypi

charcut >=1.1.1

metrics/chrf/requirements.txt pypi

sacrebleu *

metrics/comet/requirements.txt pypi

torch *
unbabel-comet *

metrics/f1/requirements.txt pypi

scikit-learn *

metrics/frugalscore/requirements.txt pypi

torch *
transformers *

metrics/glue/requirements.txt pypi

scikit-learn *
scipy *

metrics/google_bleu/requirements.txt pypi

nltk *

metrics/indic_glue/requirements.txt pypi

scikit-learn *
scipy *

metrics/mae/requirements.txt pypi

scikit-learn *

metrics/mape/requirements.txt pypi

scikit-learn *

metrics/mase/requirements.txt pypi

scikit-learn *

metrics/matthews_correlation/requirements.txt pypi

scikit-learn *

metrics/mauve/requirements.txt pypi

faiss-cpu *
mauve-text *
scikit-learn *

metrics/meteor/requirements.txt pypi

nltk *

metrics/mse/requirements.txt pypi

scikit-learn *

metrics/nist_mt/requirements.txt pypi

nltk *

metrics/pearsonr/requirements.txt pypi

scipy *

metrics/perplexity/requirements.txt pypi

torch *
transformers *

metrics/poseval/requirements.txt pypi

scikit-learn *

metrics/precision/requirements.txt pypi

scikit-learn *

metrics/recall/requirements.txt pypi

scikit-learn *

metrics/rl_reliability/requirements.txt pypi

gin-config *
scipy *
tensorflow *

metrics/roc_auc/requirements.txt pypi

scikit-learn *

metrics/rouge/requirements.txt pypi

absl-py *
nltk *
rouge_score >=0.1.2

metrics/sacrebleu/requirements.txt pypi

sacrebleu *

metrics/sari/requirements.txt pypi

sacrebleu *
sacremoses *

metrics/seqeval/requirements.txt pypi

seqeval *

metrics/smape/requirements.txt pypi

scikit-learn *

metrics/spearmanr/requirements.txt pypi

scipy *

metrics/super_glue/requirements.txt pypi

scikit-learn *

metrics/ter/requirements.txt pypi

sacrebleu *

metrics/trec_eval/requirements.txt pypi

trectools *

metrics/wer/requirements.txt pypi

jiwer *

metrics/wiki_split/requirements.txt pypi

sacrebleu *
sacremoses *

metrics/xtreme_s/requirements.txt pypi

scikit-learn *

.github/workflows/build_documentation.yml actions

.github/workflows/build_pr_documentation.yml actions

.github/workflows/delete_doc_comment.yml actions

measurements/text_duplicates/requirements.txt pypi

metrics/bleu/requirements.txt pypi

metrics/bleurt/requirements.txt pypi

metrics/code_eval/requirements.txt pypi

metrics/competition_math/requirements.txt pypi

metrics/coval/requirements.txt pypi

metrics/cuad/requirements.txt pypi

metrics/exact_match/requirements.txt pypi

metrics/mahalanobis/requirements.txt pypi

metrics/mean_iou/requirements.txt pypi

metrics/r_squared/requirements.txt pypi

metrics/squad/requirements.txt pypi

metrics/squad_v2/requirements.txt pypi

metrics/xnli/requirements.txt pypi

setup.py pypi

templates/{{ cookiecutter.module_slug }}/requirements.txt pypi