supergleber

German Language Understanding Evaluation Benchmark @NAACL24

https://github.com/lsx-uniwue/supergleber

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.7%) to scientific vocabulary

Keywords

benchmark german llm
Last synced: 7 months ago · JSON representation

Repository

German Language Understanding Evaluation Benchmark @NAACL24

Basic Info
Statistics
  • Stars: 10
  • Watchers: 6
  • Forks: 2
  • Open Issues: 0
  • Releases: 0
Topics
benchmark german llm
Created about 2 years ago · Last pushed 9 months ago
Metadata Files
Readme Citation

README.md

✨SuperGLEBer ✨

SuperGLEBer (German Language Understanding Evaluation Benchmark) is a broad Natural Language Understanding benchmark suite for the German language in order to create a better understanding of the current state of German LLMs. Our benchmark consists of 29 different tasks ranging over different types like document classification, sequence tagging, sentence similarity, and question answering.

If you use this benchmark in your research, please cite the following paper: https://aclanthology.org/2024.naacl-long.438/ For the current leaderboard and more information check out the SuperGLEBer Website 🚀

This is the updated branch that contains the new and improved version of the SuperGLEBer benchmark.

Running Experiments

create all relevant files necessary to schedule runs on a k8s/slurm cluster:

bash python src/template_k8s.py

running a model on a task:

bash python src/train.py +model=gbert_base +train_args=a100 +task=news_class

override config keys via CLI:

bash python src/train.py +model=gbert_base +train_args=a100 +task=news_class train_args.batch_size=1

you can find valid parameters in the provided yaml configs: https://github.com/LSX-UniWue/SuperGLEBer/tree/paper/src/conf

Citation

bib @inproceedings{pfister-hotho-2024-supergleber, title = "{S}uper{GLEB}er: {G}erman Language Understanding Evaluation Benchmark", author = "Pfister, Jan and Hotho, Andreas", editor = "Duh, Kevin and Gomez, Helena and Bethard, Steven", booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)", month = jun, year = "2024", address = "Mexico City, Mexico", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.naacl-long.438/", doi = "10.18653/v1/2024.naacl-long.438", pages = "7904--7923", abstract = "We assemble a broad Natural Language Understanding benchmark suite for the German language and consequently evaluate a wide array of existing German-capable models in order to create a better understanding of the current state of German LLMs. Our benchmark consists of 29 different tasks ranging over different types such as document classification, sequence tagging, sentence similarity, and question answering, on which we evaluate 10 different German-pretrained models, thereby charting the landscape of German LLMs. In our comprehensive evaluation we find that encoder models are a good choice for most tasks, but also that the largest encoder model does not necessarily perform best for all tasks. We make our benchmark suite and a leaderboard publically available at https://supergleber.professor-x.de and encourage the community to contribute new tasks and evaluate more models on it (https://github.com/LSX-UniWue/SuperGLEBer)." }

Owner

  • Name: Chair of Computer Science X - Data Science
  • Login: LSX-UniWue
  • Kind: organization
  • Location: Germany

GitHub Events

Total
  • Issues event: 5
  • Watch event: 8
  • Issue comment event: 9
  • Push event: 36
  • Pull request review event: 6
  • Pull request review comment event: 4
  • Pull request event: 2
  • Fork event: 2
  • Create event: 1
Last Year
  • Issues event: 5
  • Watch event: 8
  • Issue comment event: 9
  • Push event: 36
  • Pull request review event: 6
  • Pull request review comment event: 4
  • Pull request event: 2
  • Fork event: 2
  • Create event: 1

Dependencies

k8s/templates/Dockerfile docker
  • nvidia/cuda 11.7.1-devel-ubuntu22.04 build
requirements.txt pypi
  • Babel ==2.13.0
  • Cython ==0.29.36
  • Deprecated ==1.2.14
  • Janome ==0.5.0
  • Jinja2 ==3.1.2
  • MarkupSafe ==2.1.3
  • Pillow ==10.0.1
  • PySocks ==1.7.1
  • PyYAML ==6.0.1
  • Pygments ==2.16.1
  • QtPy ==2.4.0
  • Send2Trash ==1.8.2
  • Wikipedia-API ==0.6.0
  • accelerate ==0.23.0
  • aiohttp ==3.8.5
  • aiosignal ==1.3.1
  • annotated-types ==0.5.0
  • antlr4-python3-runtime ==4.9.3
  • anyio ==4.0.0
  • argon2-cffi ==23.1.0
  • argon2-cffi-bindings ==21.2.0
  • arrow ==1.3.0
  • asttokens ==2.4.0
  • async-lru ==2.0.4
  • async-timeout ==4.0.3
  • attrs ==23.1.0
  • backcall ==0.2.0
  • beautifulsoup4 ==4.12.2
  • bitsandbytes ==0.41.1
  • bleach ==6.0.0
  • blis ==0.7.11
  • boto3 ==1.28.60
  • botocore ==1.31.60
  • bpemb ==0.3.4
  • catalogue ==2.0.10
  • certifi ==2023.7.22
  • cffi ==1.16.0
  • charset-normalizer ==3.3.0
  • click ==8.1.7
  • cloudpathlib ==0.15.1
  • cloudpickle ==2.2.1
  • cmake ==3.27.6
  • comm ==0.1.4
  • confection ==0.1.3
  • conllu ==4.5.3
  • contourpy ==1.1.1
  • cycler ==0.12.0
  • cymem ==2.0.8
  • datasets ==2.14.5
  • debugpy ==1.8.0
  • decorator ==5.1.1
  • defusedxml ==0.7.1
  • dill ==0.3.7
  • einops ==0.7.0
  • evaluate ==0.4.0
  • exceptiongroup ==1.1.3
  • executing ==2.0.0
  • fastjsonschema ==2.18.1
  • filelock ==3.12.4
  • flair ==0.12.2
  • fonttools ==4.43.0
  • fqdn ==1.5.1
  • frozenlist ==1.4.0
  • fsspec ==2023.6.0
  • ftfy ==6.1.1
  • future ==0.18.3
  • gdown ==4.4.0
  • gensim ==4.3.2
  • huggingface-hub ==0.16.4
  • hydra-core ==1.3.2
  • hyperopt ==0.2.7
  • idna ==3.4
  • ipykernel ==6.25.2
  • ipython ==8.16.1
  • ipython-genutils ==0.2.0
  • ipywidgets ==8.1.1
  • isoduration ==20.11.0
  • jedi ==0.19.1
  • jmespath ==1.0.1
  • joblib ==1.3.2
  • json5 ==0.9.14
  • jsonlines ==4.0.0
  • jsonpointer ==2.4
  • jsonschema ==4.19.1
  • jsonschema-specifications ==2023.7.1
  • jupyter ==1.0.0
  • jupyter-console ==6.6.3
  • jupyter-events ==0.7.0
  • jupyter-lsp ==2.2.0
  • jupyter_client ==8.3.1
  • jupyter_core ==5.3.2
  • jupyter_server ==2.7.3
  • jupyter_server_terminals ==0.4.4
  • jupyterlab ==4.0.6
  • jupyterlab-pygments ==0.2.2
  • jupyterlab-widgets ==3.0.9
  • jupyterlab_server ==2.25.0
  • kiwisolver ==1.4.5
  • langcodes ==3.3.0
  • langdetect ==1.0.9
  • lit ==17.0.2
  • llvmlite ==0.41.0
  • loguru ==0.7.2
  • lxml ==4.9.3
  • markdown-it-py ==3.0.0
  • matplotlib ==3.8.0
  • matplotlib-inline ==0.1.6
  • mdurl ==0.1.2
  • mistune ==3.0.2
  • more-itertools ==10.1.0
  • mpld3 ==0.3
  • mpmath ==1.3.0
  • mteb ==1.1.1
  • multidict ==6.0.4
  • multiprocess ==0.70.15
  • murmurhash ==1.0.10
  • nbclient ==0.8.0
  • nbconvert ==7.9.1
  • nbformat ==5.9.2
  • nest-asyncio ==1.5.8
  • networkx ==3.1
  • nltk ==3.8.1
  • notebook ==7.0.4
  • notebook_shim ==0.2.3
  • numba ==0.58.0
  • numpy ==1.25.2
  • omegaconf ==2.3.0
  • overrides ==7.4.0
  • packaging ==23.2
  • pandas ==2.1.1
  • pandocfilters ==1.5.0
  • parso ==0.8.3
  • pathy ==0.10.2
  • peft ==0.5.0
  • pexpect ==4.8.0
  • pickleshare ==0.7.5
  • platformdirs ==3.11.0
  • pptree ==3.1
  • preshed ==3.0.9
  • prometheus-client ==0.17.1
  • prompt-toolkit ==3.0.39
  • protobuf ==4.24.4
  • psutil ==5.9.5
  • ptyprocess ==0.7.0
  • pure-eval ==0.2.2
  • py4j ==0.10.9.7
  • pyarrow ==13.0.0
  • pycparser ==2.21
  • pydantic ==2.4.2
  • pydantic_core ==2.10.1
  • pynndescent ==0.5.10
  • pyparsing ==3.1.1
  • python-dateutil ==2.8.2
  • python-json-logger ==2.0.7
  • pytorch_revgrad ==0.2.0
  • pytz ==2023.3.post1
  • pyzmq ==25.1.1
  • qtconsole ==5.4.4
  • referencing ==0.30.2
  • regex ==2023.10.3
  • requests ==2.31.0
  • responses ==0.18.0
  • rfc3339-validator ==0.1.4
  • rfc3986-validator ==0.1.1
  • rich ==13.6.0
  • river ==0.19.0
  • rpds-py ==0.10.3
  • s3transfer ==0.7.0
  • safetensors ==0.3.3
  • scikit-learn ==1.3.1
  • scipy ==1.11.3
  • segtok ==1.5.11
  • sentence-transformers ==2.2.2
  • sentencepiece ==0.1.99
  • sfst ==1.5.7
  • six ==1.16.0
  • smart-open ==6.4.0
  • sniffio ==1.3.0
  • soupsieve ==2.5
  • spacy ==3.7.0
  • spacy-legacy ==3.0.12
  • spacy-loggers ==1.0.5
  • sqlitedict ==2.1.0
  • srsly ==2.4.8
  • stack-data ==0.6.3
  • sympy ==1.12
  • tabulate ==0.9.0
  • tbb ==2021.10.0
  • terminado ==0.17.1
  • thinc ==8.2.1
  • threadpoolctl ==3.2.0
  • tinycss2 ==1.2.1
  • tokenizers ==0.14.0
  • tomli ==2.0.1
  • torch ==2.0.1
  • torchvision ==0.15.2
  • tornado ==6.3.3
  • tqdm ==4.66.1
  • traitlets ==5.11.2
  • transformer-smaller-training-vocab ==0.2.4
  • transformers ==4.34.0
  • triton ==2.0.0
  • typer ==0.9.0
  • types-python-dateutil ==2.8.19.14
  • typing_extensions ==4.8.0
  • tzdata ==2023.3
  • umap-learn ==0.5.4
  • uri-template ==1.3.0
  • urllib3 ==1.26.17
  • wasabi ==1.1.2
  • wcwidth ==0.2.8
  • weasel ==0.3.2
  • webcolors ==1.13
  • webencodings ==0.5.1
  • websocket-client ==1.6.3
  • widgetsnbextension ==4.0.9
  • wrapt ==1.15.0
  • xxhash ==3.3.0
  • yarl ==1.9.2
.github/workflows/dockerimage.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v2 composite
  • docker/build-push-action f2a1d5e99d037542a71f64918e516c093c6f3fc4 composite
  • docker/login-action 65b78e6e13532edd9afa3aa52ac7964289d1a9c1 composite
  • docker/metadata-action 9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7 composite