supergleber

German Language Understanding Evaluation Benchmark @NAACL24

https://github.com/lsx-uniwue/supergleber

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary

Keywords

benchmark german llm

Last synced: 11 months ago · JSON representation

Repository

German Language Understanding Evaluation Benchmark @NAACL24

Basic Info

Host: GitHub
Owner: LSX-UniWue
Language: Python
Default Branch: main
Homepage: https://supergleber.professor-x.de/
Size: 142 MB

Statistics

Stars: 10
Watchers: 6
Forks: 2
Open Issues: 0
Releases: 0

Topics

benchmark german llm

Created over 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme Citation

✨SuperGLEBer ✨

SuperGLEBer (German Language Understanding Evaluation Benchmark) is a broad Natural Language Understanding benchmark suite for the German language in order to create a better understanding of the current state of German LLMs. Our benchmark consists of 29 different tasks ranging over different types like document classification, sequence tagging, sentence similarity, and question answering.

If you use this benchmark in your research, please cite the following paper: https://aclanthology.org/2024.naacl-long.438/ For the current leaderboard and more information check out the SuperGLEBer Website 🚀

This is the updated branch that contains the new and improved version of the SuperGLEBer benchmark.

Running Experiments

create all relevant files necessary to schedule runs on a k8s/slurm cluster:

bash python src/template_k8s.py

running a model on a task:

bash python src/train.py +model=gbert_base +train_args=a100 +task=news_class

override config keys via CLI:

bash python src/train.py +model=gbert_base +train_args=a100 +task=news_class train_args.batch_size=1

you can find valid parameters in the provided yaml configs: https://github.com/LSX-UniWue/SuperGLEBer/tree/paper/src/conf

Citation

bib @inproceedings{pfister-hotho-2024-supergleber, title = "{S}uper{GLEB}er: {G}erman Language Understanding Evaluation Benchmark", author = "Pfister, Jan and Hotho, Andreas", editor = "Duh, Kevin and Gomez, Helena and Bethard, Steven", booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)", month = jun, year = "2024", address = "Mexico City, Mexico", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.naacl-long.438/", doi = "10.18653/v1/2024.naacl-long.438", pages = "7904--7923", abstract = "We assemble a broad Natural Language Understanding benchmark suite for the German language and consequently evaluate a wide array of existing German-capable models in order to create a better understanding of the current state of German LLMs. Our benchmark consists of 29 different tasks ranging over different types such as document classification, sequence tagging, sentence similarity, and question answering, on which we evaluate 10 different German-pretrained models, thereby charting the landscape of German LLMs. In our comprehensive evaluation we find that encoder models are a good choice for most tasks, but also that the largest encoder model does not necessarily perform best for all tasks. We make our benchmark suite and a leaderboard publically available at https://supergleber.professor-x.de and encourage the community to contribute new tasks and evaluate more models on it (https://github.com/LSX-UniWue/SuperGLEBer)." }

Owner

Name: Chair of Computer Science X - Data Science
Login: LSX-UniWue
Kind: organization
Location: Germany

Website: professor-x.de
Twitter: datascience_jmu
Repositories: 2
Profile: https://github.com/LSX-UniWue

GitHub Events

Total

Issues event: 5
Watch event: 8
Issue comment event: 9
Push event: 36
Pull request review event: 6
Pull request review comment event: 4
Pull request event: 2
Fork event: 2
Create event: 1

Last Year

Issues event: 5
Watch event: 8
Issue comment event: 9
Push event: 36
Pull request review event: 6
Pull request review comment event: 4
Pull request event: 2
Fork event: 2
Create event: 1

Dependencies

k8s/templates/Dockerfile docker

nvidia/cuda 11.7.1-devel-ubuntu22.04 build

requirements.txt pypi

Babel ==2.13.0
Cython ==0.29.36
Deprecated ==1.2.14
Janome ==0.5.0
Jinja2 ==3.1.2
MarkupSafe ==2.1.3
Pillow ==10.0.1
PySocks ==1.7.1
PyYAML ==6.0.1
Pygments ==2.16.1
QtPy ==2.4.0
Send2Trash ==1.8.2
Wikipedia-API ==0.6.0
accelerate ==0.23.0
aiohttp ==3.8.5
aiosignal ==1.3.1
annotated-types ==0.5.0
antlr4-python3-runtime ==4.9.3
anyio ==4.0.0
argon2-cffi ==23.1.0
argon2-cffi-bindings ==21.2.0
arrow ==1.3.0
asttokens ==2.4.0
async-lru ==2.0.4
async-timeout ==4.0.3
attrs ==23.1.0
backcall ==0.2.0
beautifulsoup4 ==4.12.2
bitsandbytes ==0.41.1
bleach ==6.0.0
blis ==0.7.11
boto3 ==1.28.60
botocore ==1.31.60
bpemb ==0.3.4
catalogue ==2.0.10
certifi ==2023.7.22
cffi ==1.16.0
charset-normalizer ==3.3.0
click ==8.1.7
cloudpathlib ==0.15.1
cloudpickle ==2.2.1
cmake ==3.27.6
comm ==0.1.4
confection ==0.1.3
conllu ==4.5.3
contourpy ==1.1.1
cycler ==0.12.0
cymem ==2.0.8
datasets ==2.14.5
debugpy ==1.8.0
decorator ==5.1.1
defusedxml ==0.7.1
dill ==0.3.7
einops ==0.7.0
evaluate ==0.4.0
exceptiongroup ==1.1.3
executing ==2.0.0
fastjsonschema ==2.18.1
filelock ==3.12.4
flair ==0.12.2
fonttools ==4.43.0
fqdn ==1.5.1
frozenlist ==1.4.0
fsspec ==2023.6.0
ftfy ==6.1.1
future ==0.18.3
gdown ==4.4.0
gensim ==4.3.2
huggingface-hub ==0.16.4
hydra-core ==1.3.2
hyperopt ==0.2.7
idna ==3.4
ipykernel ==6.25.2
ipython ==8.16.1
ipython-genutils ==0.2.0
ipywidgets ==8.1.1
isoduration ==20.11.0
jedi ==0.19.1
jmespath ==1.0.1
joblib ==1.3.2
json5 ==0.9.14
jsonlines ==4.0.0
jsonpointer ==2.4
jsonschema ==4.19.1
jsonschema-specifications ==2023.7.1
jupyter ==1.0.0
jupyter-console ==6.6.3
jupyter-events ==0.7.0
jupyter-lsp ==2.2.0
jupyter_client ==8.3.1
jupyter_core ==5.3.2
jupyter_server ==2.7.3
jupyter_server_terminals ==0.4.4
jupyterlab ==4.0.6
jupyterlab-pygments ==0.2.2
jupyterlab-widgets ==3.0.9
jupyterlab_server ==2.25.0
kiwisolver ==1.4.5
langcodes ==3.3.0
langdetect ==1.0.9
lit ==17.0.2
llvmlite ==0.41.0
loguru ==0.7.2
lxml ==4.9.3
markdown-it-py ==3.0.0
matplotlib ==3.8.0
matplotlib-inline ==0.1.6
mdurl ==0.1.2
mistune ==3.0.2
more-itertools ==10.1.0
mpld3 ==0.3
mpmath ==1.3.0
mteb ==1.1.1
multidict ==6.0.4
multiprocess ==0.70.15
murmurhash ==1.0.10
nbclient ==0.8.0
nbconvert ==7.9.1
nbformat ==5.9.2
nest-asyncio ==1.5.8
networkx ==3.1
nltk ==3.8.1
notebook ==7.0.4
notebook_shim ==0.2.3
numba ==0.58.0
numpy ==1.25.2
omegaconf ==2.3.0
overrides ==7.4.0
packaging ==23.2
pandas ==2.1.1
pandocfilters ==1.5.0
parso ==0.8.3
pathy ==0.10.2
peft ==0.5.0
pexpect ==4.8.0
pickleshare ==0.7.5
platformdirs ==3.11.0
pptree ==3.1
preshed ==3.0.9
prometheus-client ==0.17.1
prompt-toolkit ==3.0.39
protobuf ==4.24.4
psutil ==5.9.5
ptyprocess ==0.7.0
pure-eval ==0.2.2
py4j ==0.10.9.7
pyarrow ==13.0.0
pycparser ==2.21
pydantic ==2.4.2
pydantic_core ==2.10.1
pynndescent ==0.5.10
pyparsing ==3.1.1
python-dateutil ==2.8.2
python-json-logger ==2.0.7
pytorch_revgrad ==0.2.0
pytz ==2023.3.post1
pyzmq ==25.1.1
qtconsole ==5.4.4
referencing ==0.30.2
regex ==2023.10.3
requests ==2.31.0
responses ==0.18.0
rfc3339-validator ==0.1.4
rfc3986-validator ==0.1.1
rich ==13.6.0
river ==0.19.0
rpds-py ==0.10.3
s3transfer ==0.7.0
safetensors ==0.3.3
scikit-learn ==1.3.1
scipy ==1.11.3
segtok ==1.5.11
sentence-transformers ==2.2.2
sentencepiece ==0.1.99
sfst ==1.5.7
six ==1.16.0
smart-open ==6.4.0
sniffio ==1.3.0
soupsieve ==2.5
spacy ==3.7.0
spacy-legacy ==3.0.12
spacy-loggers ==1.0.5
sqlitedict ==2.1.0
srsly ==2.4.8
stack-data ==0.6.3
sympy ==1.12
tabulate ==0.9.0
tbb ==2021.10.0
terminado ==0.17.1
thinc ==8.2.1
threadpoolctl ==3.2.0
tinycss2 ==1.2.1
tokenizers ==0.14.0
tomli ==2.0.1
torch ==2.0.1
torchvision ==0.15.2
tornado ==6.3.3
tqdm ==4.66.1
traitlets ==5.11.2
transformer-smaller-training-vocab ==0.2.4
transformers ==4.34.0
triton ==2.0.0
typer ==0.9.0
types-python-dateutil ==2.8.19.14
typing_extensions ==4.8.0
tzdata ==2023.3
umap-learn ==0.5.4
uri-template ==1.3.0
urllib3 ==1.26.17
wasabi ==1.1.2
wcwidth ==0.2.8
weasel ==0.3.2
webcolors ==1.13
webencodings ==0.5.1
websocket-client ==1.6.3
widgetsnbextension ==4.0.9
wrapt ==1.15.0
xxhash ==3.3.0
yarl ==1.9.2

.github/workflows/dockerimage.yml actions

actions/checkout v4 composite
actions/setup-python v2 composite
docker/build-push-action f2a1d5e99d037542a71f64918e516c093c6f3fc4 composite
docker/login-action 65b78e6e13532edd9afa3aa52ac7964289d1a9c1 composite
docker/metadata-action 9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7 composite

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science