supergleber
German Language Understanding Evaluation Benchmark @NAACL24
Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.7%) to scientific vocabulary
Keywords
Repository
German Language Understanding Evaluation Benchmark @NAACL24
Basic Info
- Host: GitHub
- Owner: LSX-UniWue
- Language: Python
- Default Branch: main
- Homepage: https://supergleber.professor-x.de/
- Size: 142 MB
Statistics
- Stars: 10
- Watchers: 6
- Forks: 2
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
✨SuperGLEBer ✨
SuperGLEBer (German Language Understanding Evaluation Benchmark) is a broad Natural Language Understanding benchmark suite for the German language in order to create a better understanding of the current state of German LLMs. Our benchmark consists of 29 different tasks ranging over different types like document classification, sequence tagging, sentence similarity, and question answering.
If you use this benchmark in your research, please cite the following paper: https://aclanthology.org/2024.naacl-long.438/ For the current leaderboard and more information check out the SuperGLEBer Website 🚀
This is the updated branch that contains the new and improved version of the SuperGLEBer benchmark.
Running Experiments
create all relevant files necessary to schedule runs on a k8s/slurm cluster:
bash
python src/template_k8s.py
running a model on a task:
bash
python src/train.py +model=gbert_base +train_args=a100 +task=news_class
override config keys via CLI:
bash
python src/train.py +model=gbert_base +train_args=a100 +task=news_class train_args.batch_size=1
you can find valid parameters in the provided yaml configs: https://github.com/LSX-UniWue/SuperGLEBer/tree/paper/src/conf
Citation
bib
@inproceedings{pfister-hotho-2024-supergleber,
title = "{S}uper{GLEB}er: {G}erman Language Understanding Evaluation Benchmark",
author = "Pfister, Jan and
Hotho, Andreas",
editor = "Duh, Kevin and
Gomez, Helena and
Bethard, Steven",
booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
month = jun,
year = "2024",
address = "Mexico City, Mexico",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.naacl-long.438/",
doi = "10.18653/v1/2024.naacl-long.438",
pages = "7904--7923",
abstract = "We assemble a broad Natural Language Understanding benchmark suite for the German language and consequently evaluate a wide array of existing German-capable models in order to create a better understanding of the current state of German LLMs. Our benchmark consists of 29 different tasks ranging over different types such as document classification, sequence tagging, sentence similarity, and question answering, on which we evaluate 10 different German-pretrained models, thereby charting the landscape of German LLMs. In our comprehensive evaluation we find that encoder models are a good choice for most tasks, but also that the largest encoder model does not necessarily perform best for all tasks. We make our benchmark suite and a leaderboard publically available at https://supergleber.professor-x.de and encourage the community to contribute new tasks and evaluate more models on it (https://github.com/LSX-UniWue/SuperGLEBer)."
}
Owner
- Name: Chair of Computer Science X - Data Science
- Login: LSX-UniWue
- Kind: organization
- Location: Germany
- Website: professor-x.de
- Twitter: datascience_jmu
- Repositories: 2
- Profile: https://github.com/LSX-UniWue
GitHub Events
Total
- Issues event: 5
- Watch event: 8
- Issue comment event: 9
- Push event: 36
- Pull request review event: 6
- Pull request review comment event: 4
- Pull request event: 2
- Fork event: 2
- Create event: 1
Last Year
- Issues event: 5
- Watch event: 8
- Issue comment event: 9
- Push event: 36
- Pull request review event: 6
- Pull request review comment event: 4
- Pull request event: 2
- Fork event: 2
- Create event: 1
Dependencies
- nvidia/cuda 11.7.1-devel-ubuntu22.04 build
- Babel ==2.13.0
- Cython ==0.29.36
- Deprecated ==1.2.14
- Janome ==0.5.0
- Jinja2 ==3.1.2
- MarkupSafe ==2.1.3
- Pillow ==10.0.1
- PySocks ==1.7.1
- PyYAML ==6.0.1
- Pygments ==2.16.1
- QtPy ==2.4.0
- Send2Trash ==1.8.2
- Wikipedia-API ==0.6.0
- accelerate ==0.23.0
- aiohttp ==3.8.5
- aiosignal ==1.3.1
- annotated-types ==0.5.0
- antlr4-python3-runtime ==4.9.3
- anyio ==4.0.0
- argon2-cffi ==23.1.0
- argon2-cffi-bindings ==21.2.0
- arrow ==1.3.0
- asttokens ==2.4.0
- async-lru ==2.0.4
- async-timeout ==4.0.3
- attrs ==23.1.0
- backcall ==0.2.0
- beautifulsoup4 ==4.12.2
- bitsandbytes ==0.41.1
- bleach ==6.0.0
- blis ==0.7.11
- boto3 ==1.28.60
- botocore ==1.31.60
- bpemb ==0.3.4
- catalogue ==2.0.10
- certifi ==2023.7.22
- cffi ==1.16.0
- charset-normalizer ==3.3.0
- click ==8.1.7
- cloudpathlib ==0.15.1
- cloudpickle ==2.2.1
- cmake ==3.27.6
- comm ==0.1.4
- confection ==0.1.3
- conllu ==4.5.3
- contourpy ==1.1.1
- cycler ==0.12.0
- cymem ==2.0.8
- datasets ==2.14.5
- debugpy ==1.8.0
- decorator ==5.1.1
- defusedxml ==0.7.1
- dill ==0.3.7
- einops ==0.7.0
- evaluate ==0.4.0
- exceptiongroup ==1.1.3
- executing ==2.0.0
- fastjsonschema ==2.18.1
- filelock ==3.12.4
- flair ==0.12.2
- fonttools ==4.43.0
- fqdn ==1.5.1
- frozenlist ==1.4.0
- fsspec ==2023.6.0
- ftfy ==6.1.1
- future ==0.18.3
- gdown ==4.4.0
- gensim ==4.3.2
- huggingface-hub ==0.16.4
- hydra-core ==1.3.2
- hyperopt ==0.2.7
- idna ==3.4
- ipykernel ==6.25.2
- ipython ==8.16.1
- ipython-genutils ==0.2.0
- ipywidgets ==8.1.1
- isoduration ==20.11.0
- jedi ==0.19.1
- jmespath ==1.0.1
- joblib ==1.3.2
- json5 ==0.9.14
- jsonlines ==4.0.0
- jsonpointer ==2.4
- jsonschema ==4.19.1
- jsonschema-specifications ==2023.7.1
- jupyter ==1.0.0
- jupyter-console ==6.6.3
- jupyter-events ==0.7.0
- jupyter-lsp ==2.2.0
- jupyter_client ==8.3.1
- jupyter_core ==5.3.2
- jupyter_server ==2.7.3
- jupyter_server_terminals ==0.4.4
- jupyterlab ==4.0.6
- jupyterlab-pygments ==0.2.2
- jupyterlab-widgets ==3.0.9
- jupyterlab_server ==2.25.0
- kiwisolver ==1.4.5
- langcodes ==3.3.0
- langdetect ==1.0.9
- lit ==17.0.2
- llvmlite ==0.41.0
- loguru ==0.7.2
- lxml ==4.9.3
- markdown-it-py ==3.0.0
- matplotlib ==3.8.0
- matplotlib-inline ==0.1.6
- mdurl ==0.1.2
- mistune ==3.0.2
- more-itertools ==10.1.0
- mpld3 ==0.3
- mpmath ==1.3.0
- mteb ==1.1.1
- multidict ==6.0.4
- multiprocess ==0.70.15
- murmurhash ==1.0.10
- nbclient ==0.8.0
- nbconvert ==7.9.1
- nbformat ==5.9.2
- nest-asyncio ==1.5.8
- networkx ==3.1
- nltk ==3.8.1
- notebook ==7.0.4
- notebook_shim ==0.2.3
- numba ==0.58.0
- numpy ==1.25.2
- omegaconf ==2.3.0
- overrides ==7.4.0
- packaging ==23.2
- pandas ==2.1.1
- pandocfilters ==1.5.0
- parso ==0.8.3
- pathy ==0.10.2
- peft ==0.5.0
- pexpect ==4.8.0
- pickleshare ==0.7.5
- platformdirs ==3.11.0
- pptree ==3.1
- preshed ==3.0.9
- prometheus-client ==0.17.1
- prompt-toolkit ==3.0.39
- protobuf ==4.24.4
- psutil ==5.9.5
- ptyprocess ==0.7.0
- pure-eval ==0.2.2
- py4j ==0.10.9.7
- pyarrow ==13.0.0
- pycparser ==2.21
- pydantic ==2.4.2
- pydantic_core ==2.10.1
- pynndescent ==0.5.10
- pyparsing ==3.1.1
- python-dateutil ==2.8.2
- python-json-logger ==2.0.7
- pytorch_revgrad ==0.2.0
- pytz ==2023.3.post1
- pyzmq ==25.1.1
- qtconsole ==5.4.4
- referencing ==0.30.2
- regex ==2023.10.3
- requests ==2.31.0
- responses ==0.18.0
- rfc3339-validator ==0.1.4
- rfc3986-validator ==0.1.1
- rich ==13.6.0
- river ==0.19.0
- rpds-py ==0.10.3
- s3transfer ==0.7.0
- safetensors ==0.3.3
- scikit-learn ==1.3.1
- scipy ==1.11.3
- segtok ==1.5.11
- sentence-transformers ==2.2.2
- sentencepiece ==0.1.99
- sfst ==1.5.7
- six ==1.16.0
- smart-open ==6.4.0
- sniffio ==1.3.0
- soupsieve ==2.5
- spacy ==3.7.0
- spacy-legacy ==3.0.12
- spacy-loggers ==1.0.5
- sqlitedict ==2.1.0
- srsly ==2.4.8
- stack-data ==0.6.3
- sympy ==1.12
- tabulate ==0.9.0
- tbb ==2021.10.0
- terminado ==0.17.1
- thinc ==8.2.1
- threadpoolctl ==3.2.0
- tinycss2 ==1.2.1
- tokenizers ==0.14.0
- tomli ==2.0.1
- torch ==2.0.1
- torchvision ==0.15.2
- tornado ==6.3.3
- tqdm ==4.66.1
- traitlets ==5.11.2
- transformer-smaller-training-vocab ==0.2.4
- transformers ==4.34.0
- triton ==2.0.0
- typer ==0.9.0
- types-python-dateutil ==2.8.19.14
- typing_extensions ==4.8.0
- tzdata ==2023.3
- umap-learn ==0.5.4
- uri-template ==1.3.0
- urllib3 ==1.26.17
- wasabi ==1.1.2
- wcwidth ==0.2.8
- weasel ==0.3.2
- webcolors ==1.13
- webencodings ==0.5.1
- websocket-client ==1.6.3
- widgetsnbextension ==4.0.9
- wrapt ==1.15.0
- xxhash ==3.3.0
- yarl ==1.9.2
- actions/checkout v4 composite
- actions/setup-python v2 composite
- docker/build-push-action f2a1d5e99d037542a71f64918e516c093c6f3fc4 composite
- docker/login-action 65b78e6e13532edd9afa3aa52ac7964289d1a9c1 composite
- docker/metadata-action 9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7 composite