euroeval

The robust European language model benchmark.

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.7%) to scientific vocabulary

Keywords

danish-language dutch-language english-language european evaluation-framework faroese-language finnish-language french-language german-language icelandic-language italian-language llms nlp-machine-learning norwegian-language portuguese-language spanish-language swedish-language

Last synced: 6 months ago · JSON representation ·

Repository

The robust European language model benchmark.

Basic Info

Host: GitHub
Owner: EuroEval
License: mit
Language: Python
Default Branch: main
Homepage: https://euroeval.com
Size: 94.3 MB

Statistics

Stars: 121
Watchers: 7
Forks: 30
Open Issues: 148
Releases: 167

Topics

Created over 4 years ago · Last pushed 6 months ago

Metadata Files

Readme Changelog Contributing License Code of conduct Citation

README.md

The robust European language model benchmark.

(formerly known as ScandEval)

Maintainer

Dan Saattrup Smart (@saattrupdan, dan.smart@alexandra.dk)

Installation

To install the package simply write the following command in your favorite terminal: $ pip install euroeval[all]

This will install the EuroEval package with all extras. You can also install the minimal version by leaving out the [all], in which case the package will let you know when an evaluation requires a certain extra dependency, and how you install it.

Quickstart

Benchmarking from the Command Line

The easiest way to benchmark pretrained models is via the command line interface. After having installed the package, you can benchmark your favorite model like so: $ euroeval --model <model-id>

Here model is the HuggingFace model ID, which can be found on the HuggingFace Hub. By default this will benchmark the model on all the tasks available. If you want to benchmark on a particular task, then use the --task argument: $ euroeval --model <model-id> --task sentiment-classification

We can also narrow down which languages we would like to benchmark on. This can be done by setting the --language argument. Here we thus benchmark the model on the Danish sentiment classification task: $ euroeval --model <model-id> --task sentiment-classification --language da

Multiple models, datasets and/or languages can be specified by just attaching multiple arguments. Here is an example with two models: $ euroeval --model <model-id1> --model <model-id2>

The specific model version/revision to use can also be added after the suffix '@': $ euroeval --model <model-id>@<commit>

This can be a branch name, a tag name, or a commit id. It defaults to 'main' for latest.

See all the arguments and options available for the euroeval command by typing $ euroeval --help

Benchmarking from a Script

In a script, the syntax is similar to the command line interface. You simply initialise an object of the Benchmarker class, and call this benchmark object with your favorite model: ```

from euroeval import Benchmarker benchmark = Benchmarker() benchmark(model="") ```

To benchmark on a specific task and/or language, you simply specify the task or language arguments, shown here with same example as above: ```

benchmark(model="", task="sentiment-classification", language="da") ```

If you want to benchmark a subset of all the models on the Hugging Face Hub, you can simply leave out the model argument. In this example, we're benchmarking all Danish models on the Danish sentiment classification task: ```

benchmark(task="sentiment-classification", language="da") ```

Benchmarking from Docker

A Dockerfile is provided in the repo, which can be downloaded and run, without needing to clone the repo and installing from source. This can be fetched programmatically by running the following: $ wget https://raw.githubusercontent.com/EuroEval/EuroEval/main/Dockerfile.cuda

Next, to be able to build the Docker image, first ensure that the NVIDIA Container Toolkit is installed and configured. Ensure that the the CUDA version stated at the top of the Dockerfile matches the CUDA version installed (which you can check using nvidia-smi). After that, we build the image as follows: $ docker build --pull -t euroeval -f Dockerfile.cuda .

With the Docker image built, we can now evaluate any model as follows: $ docker run -e args="<euroeval-arguments>" --gpus 1 --name euroeval --rm euroeval

Here <euroeval-arguments> consists of the arguments added to the euroeval CLI argument. This could for instance be --model <model-id> --task sentiment-classification.

Reproducing the datasets

All datasets used in this project are generated using the scripts located in the src/scripts folder. To reproduce a dataset, run the corresponding script with the following command

shell $ uv run src/scripts/<name-of-script>.py

Replace with the specific script you wish to execute, e.g.,

shell $ uv run src/scripts/create_allocine.py

Contributors :pray:

A huge thank you to all the contributors who have helped make this project a success!

Contribute to EuroEval

We welcome contributions to EuroEval! Whether you're fixing bugs, adding features, or contributing new datasets, your help makes this project better for everyone.

General contributions: Check out our contribution guidelines for information on how to get started.
Adding datasets: If you're interested in adding a new dataset to EuroEval, we have a dedicated guide with step-by-step instructions.

Special Thanks

Thanks to Google for sponsoring Gemini credits as part of their Google Cloud for Researchers Program.
Thanks @Mikeriess for evaluating many of the larger models on the leaderboards.
Thanks to OpenAI for sponsoring OpenAI credits as part of their Researcher Access Program.
Thanks to UWV and KU Leuven for sponsoring the Azure OpenAI credits used to evaluate GPT-4-turbo in Dutch.
Thanks to Miðeind for sponsoring the OpenAI credits used to evaluate GPT-4-turbo in Icelandic and Faroese.
Thanks to CHC for sponsoring the OpenAI credits used to evaluate GPT-4-turbo in German.

Citing EuroEval

If you want to cite the framework then feel free to use this:

@article{smart2024encoder, title={Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks}, author={Smart, Dan Saattrup and Enevoldsen, Kenneth and Schneider-Kamp, Peter}, journal={arXiv preprint arXiv:2406.13469}, year={2024} } @inproceedings{smart2023scandeval, author = {Smart, Dan Saattrup}, booktitle = {Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)}, month = may, pages = {185--201}, title = {{ScandEval: A Benchmark for Scandinavian Natural Language Processing}}, year = {2023} }

Owner

Name: EuroEval
Login: EuroEval
Kind: organization

Website: euroeval.com
Repositories: 2
Profile: https://github.com/EuroEval

Citation (CITATION.cff)

cff-version: 1.2.0
title: EuroEval
message: If you use this software, please cite it using the metadata from this file.
type: software
authors:
  - given-names: Dan Saattrup
    family-names: Smart
    email: dan.smart@alexandra.dk
    affiliation: Alexandra Institute
    orcid: 'https://orcid.org/0000-0001-9227-1470'
identifiers:
  - type: url
    value: 'https://aclanthology.org/2023.nodalida-1.20'
    description: Paper regarding evaluation of Scandinavian encoders.
repository-code: 'https://github.com/EuroEval/EuroEval'
url: 'https://euroeval.com/'
abstract: Evaluation of language models on mono- or multilingual tasks.
keywords:
  - nlp
  - evaluation
license: MIT
preferred-citation:
  type: conference-paper
  authors:
  - family-names: "Smart"
    given-names: "Dan Saattrup"
    orcid: https://orcid.org/0000-0001-9227-1470
  collection-title: "Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)"
  month: 5
  start: 185 # First page number
  end: 201 # Last page number
  title: "ScandEval: A Benchmark for Scandinavian Natural Language Processing"
  year: 2023

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 210
Total pull requests: 129
Average time to close issues: about 1 month
Average time to close pull requests: 3 days
Total issue authors: 18
Total pull request authors: 10
Average comments per issue: 1.13
Average comments per pull request: 0.46
Merged pull requests: 99
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 206
Pull requests: 129
Average time to close issues: 24 days
Average time to close pull requests: 3 days
Issue authors: 18
Pull request authors: 10
Average comments per issue: 1.13
Average comments per pull request: 0.46
Merged pull requests: 99
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

saattrupdan (122)
mathiasesn (38)
EwoutH (19)
KennethEnevoldsen (8)
Linguistcoder (5)
djstrong (3)
Mikeriess (2)
andersgb1 (2)
usarth (2)
RobinSmits (1)
iceychris (1)
Alkarex (1)
noahmanu (1)
rlrs (1)
marksverdhei (1)

Pull Request Authors

saattrupdan (96)
oliverkinch (12)
viggo-gascou (6)
slowwavesleep (5)
mathiasesn (4)
KennethEnevoldsen (2)
duarteocarmo (1)
Alkarex (1)
marksverdhei (1)
Rijgersberg (1)

Top Labels

Issue Labels

model evaluation request (156) small model (<=8B) (43) large model (>8B) (27) benchmark dataset request (20) help wanted (10) good first issue (7) enhancement (7) bug (6) documentation (3) not-supported-in-vllm-yet (1)

Pull Request Labels

documentation (1)

Packages

Total packages: 2
Total downloads:
- pypi 1,460 last-month

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 1
(may contain duplicates)
Total versions: 193
Total maintainers: 1

pypi.org: scandeval

The robust European language model benchmark.

Documentation: https://scandeval.readthedocs.io/
License: MIT License Copyright (c) 2022-2025 Dan Saattrup Smart Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Latest release: 15.16.0
published 6 months ago

Versions: 167
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 888 Last month

Rankings

Dependent packages count: 10.1%

Downloads: 10.1%

Forks count: 11.4%

Stargazers count: 11.9%

Average: 13.0%

Dependent repos count: 21.6%

Maintainers (1)

saattrupdan

Last synced: 6 months ago

pypi.org: euroeval

The robust European language model benchmark.

Documentation: https://euroeval.readthedocs.io/
License: MIT License Copyright (c) 2022-2025 Dan Saattrup Smart Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Latest release: 15.16.0
published 6 months ago

Versions: 26
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 572 Last month

Rankings

Dependent packages count: 9.6%

Average: 31.7%

Dependent repos count: 53.9%

Maintainers (1)

saattrupdan

Last synced: 6 months ago

Dependencies

.github/workflows/ci.yaml actions

actions/checkout v3 composite
actions/setup-python v4 composite
jpetrucciani/black-check master composite

.github/workflows/docs.yaml actions

actions/checkout v3 composite
actions/deploy-pages v1 composite
actions/setup-python v4 composite
actions/upload-artifact v3 composite

poetry.lock pypi

112 dependencies

pyproject.toml pypi

black ^22.3.0 develop
isort ^5.10.1 develop
lxml ^4.9.0 develop
pdoc ^7.1.1 develop
pre-commit ^2.17.0 develop
pytest ^6.2.5 develop
pytest-cov ^3.0.0 develop
pytest-xdist ^2.5.0 develop
readme-coverage-badger ^0.1.2 develop
requests ^2.28.0 develop
click ^8.1.3
datasets ^2.7.0
evaluate >=0.3.0,<1.0.0
flax >=0.6.3,<1.0.0
huggingface-hub >=0.7.0,<1.0.0
jax >=0.4.1,<1.0.0
jaxlib >=0.4.1,<1.0.0
numpy ^1.23.0
pandas ^1.4.0
protobuf >=3.20.0,<3.21.0
pyinfer >=0.0.3,<1.0.0
python >=3.8,<3.11
python-dotenv >=0.20.0,<1.0.0
sacremoses >=0.0.53,<1.0.0
sentencepiece >=0.1.96,<1.0.0
seqeval ^1.2.2
termcolor ^1.1.0
torch ^1.12.1
tqdm ^4.62.0
transformers ^4.20.0