https://github.com/aksw/llm-kg-bench

LLM-KG-Bench is a Framework and task collection for automated benchmarking of Large Language Models (LLMs) on Knowledge Graph (KG) related tasks.

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 25 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary

Keywords

knowledge-graph large-language-models llm llm-benchmarking rdf sparql

Last synced: 6 months ago · JSON representation

Repository

LLM-KG-Bench is a Framework and task collection for automated benchmarking of Large Language Models (LLMs) on Knowledge Graph (KG) related tasks.

Basic Info

Host: GitHub
Owner: AKSW
License: mpl-2.0
Language: Python
Default Branch: main
Homepage:
Size: 20.7 MB

Statistics

Stars: 42
Watchers: 26
Forks: 5
Open Issues: 9
Releases: 7

Topics

knowledge-graph large-language-models llm llm-benchmarking rdf sparql

Created almost 3 years ago · Last pushed 8 months ago

Metadata Files

Readme Changelog License

LLM-KG-Bench

Framework and task collection for automated benchmarking of Large Language Modells (LLMs) on Knowledge Graph (KG) related tasks.

Architecture diagram for the benchmark suite:

Architecture

The architecture is based on and roughly compatible with Big Bench. We added some additional features like iterations, task parameters and prompt-answer-evaluate loop with the new task API, see doc/mainConcepts.md for an introduction on the main concepts.

Requirements, Installation and execution

Requisites

at least Python 3.8
define required API keys for LLM models used as environment variable:
- for ChatGPT from OpenAI define OPENAI_API_KEY, e.g. via export OPENAI_API_KEY=$(cat ./tmp/openai.txt); echo $OPENAI_API_KEY | cut -c 1-4
- for Claude from Anthropic define CLAUDE_API_KEY, e.g. via export CLAUDE_API_KEY=$(cat ./tmp/claude.txt); echo $CLAUDE_API_KEY | cut -c 1-4
- for Gemini from Google define GOOGLE_API_KEY, e.g. via export GOOGLE_API_KEY=$(cat ./tmp/google.txt); echo $GOOGLE_API_KEY | cut -c 1-4
GPT4All for related models
vLLM for related models

The python dependencies are managed by poetry. See doc/dependencies.md for a explanation of the python dependencies.

installation

If you have poetry installed (see the poetry documentation) run

shell $ poetry install

Otherwise check doc/execution.md

configure and execute benchmark

Copy the configuration file from LlmKgBench/configuration.dist.yml to LlmKgBench/configuration.yml check it and adjust it to your needs. In the configuration file you can define which tests to run on which models for which sizes with how many iterations each. The configuration schema is described in doc/configuration.md.

Then execute the benchmark with the current configuration: shell $ poetry run LlmKgBench

The available benchmark tasks can be found in the folder LlmKgBench/tasks/.

Result files generated

Results and logs are stored in folder runs. The generated filenames include the date and time of program start in the form .

result files generated, different serialization formats containing same information:
- llm-kg-bench_run-[YYYY-mm-DD_HH-MM-ss]_result.json
- llm-kg-bench_run-[YYYY-mm-DD_HH-MM-ss]_result.yaml
- llm-kg-bench_run-[YYYY-mm-DD_HH-MM-ss]_result.txt
model log: llm-kg-bench_run-[YYYY-mm-DD_HH-MM-ss]_modelLog.jsonl
debug log: llm-kg-bench_run-[YYYY-mm-DD_HH-MM-ss]_debug-log.log

Some results got already published as listed in publications section

reevaluation on given result files

The LLML-KG-Bench framework supports the reevaluation of given result files via the --reeval parameter, see doc/execution.md

Plans and contribution possibilities

LLM-KG-Bench is published open source under the Mozilla Public License Version 2.0 and we are looking forward to your contribution via pull request or issues. We are especially interested in:

bug fixes and improvements
additional KG related benchmark tasks
support for additional model connectors

We are planning to start a public leaderboard soon. Stay tuned.

Test dataset, please do not use for training

The benchmarks collected here are meant for testing of LLMs. Please do not use them for training of LLMs. If you are interested in training data, please contact us, either via email or open an issue at the github repository.

Publications on LLM-KG-Bench and generated results

Snapshots of this repository are archived at zenodo:
published results are collected at our results repository https://github.com/AKSW/LLM-KG-Bench-Results
"Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering" Lars-Peter Meyer et al. 2023, in Poster Proceedings of Semantics-23, Leipzig: Article (copy at arXiv:2308.16622, local pdf), Poster.
- Results: GitHub,
  
  @inproceedings{Meyer2023DevelopingScalableBenchmark, author = {Meyer, Lars-Peter and Frey, Johannes and Junghanns, Kurt and Brei, Felix and Bulert, Kirill and Gründer-Fahrer, Sabine and Martin, Michael}, title = {Developing a Scalable Benchmark for Assessing Large Language Models in Knowledge Graph Engineering}, year = {2023}, booktitle = {Proceedings of Poster Track of Semantics 2023}, doi = {10.48550/ARXIV.2308.16622}, url = {https://ceur-ws.org/Vol-3526/paper-04.pdf}, }
"Benchmarking the Abilities of Large Language Models for RDF Knowledge Graph Creation and Comprehension: How Well Do LLMs Speak Turtle?" Johannes Frey et al. 2023, in Workshop Proceedings of DL4KG@ISWC-23: Article (copy at arXiv:2309.17122, local pdf).
- Results: GitHub,
  
  @inproceedings{Frey2023BenchmarkingAbilitiesLarge, author = {Frey, Johannes and Meyer, Lars-Peter and Arndt, Natanael and Brei, Felix and Bulert, Kirill}, title = {Benchmarking the Abilities of Large Language Models for {RDF} Knowledge Graph Creation and Comprehension: How Well Do {LLMs} Speak Turtle?}, year = {2023}, booktitle = {Proceedings of Workshop Deep Learning for Knowledge Graphs (DL4KG) @ ISWC23}, doi = {10.48550/ARXIV.2309.17122}, url = {https://ceur-ws.org/Vol-3559/paper-3.pdf} }
"Assessing the Evolution of LLM capabilities for Knowledge Graph Engineering in 2023" Johannes Frey et al. 2024, in Proceedings of ESWC 2024 Special Track on LLMs for KE: Article (copy at ESWC24, local pdf).
- Results: GitHub,
  
  @inproceedings{Frey2024AssessingEvolutionLLM, author = {Frey, Johannes and Meyer, Lars-Peter and Brei, Felix and Gründer-Fahrer, Sabine and Martin, Michael}, title = {Assessing the Evolution of {LLM} capabilities for Knowledge Graph Engineering in 2023}, year = {2025}, booktitle = {The Semantic Web: {ESWC} 2024 Satellite Events}, publisher = {Springer Nature Switzerland}, issn = {1611-3349}, pages = {51--60}, doi = {10.1007/978-3-031-78952-6_5}, }
"Assessing SPARQL capabilities of Large Language Models" Lars-Peter Meyer et al. 2024, in Proceedings of Workshop NLP4KGC@SEMANTICS 2024: Article (copy at (arXiv:2409.05925), local pdf).
- Results: GitHub,
  
  @inproceedings{Meyer2024AssessingSparqlCapabilititesLLM, author = {Meyer, Lars-Peter and Frey, Johannes and Brei, Felix and Arndt, Natanael}, title = {Assessing {SPARQL} capabilities of Large Language Models}, booktitle = {Proceedings of the 3rd International Workshop on Natural Language Processing for Knowledge Graph Creation co-located with 20th International Conference on Semantic Systems ({SEMANTiCS} 2024)}, year = {2024}, editor = {Edlira Vakaj and Sima Iranmanesh and Rizou Stamartina and Nandana Mihindukulasooriya and Sanju Tiwari and Fernando Ortiz-Rodríguez and Ryan Mcgranaghan}, url = {https://ceur-ws.org/Vol-3874/paper3.pdf}, }
"LLM-KG-Bench 3.0: A Compass for Semantic Technology Capabilities in the Ocean of LLMs" Lars-Peter Meyer et al. 2025, to appear in Proceedings of ESWC 2025 resources track: local pdf(preprint)
- Results: GitHub,
  
  @InProceedings{Meyer2025LLMKGBench3, author = {Lars-Peter Meyer and Johannes Frey and Desiree Heim and Felix Brei and Claus Stadler and Kurt Junghanns and Michael Martin}, title = {{LLM-KG-Bench} 3.0: A Compass for SemanticTechnology Capabilities in the Ocean of {LLMs}}, year = {2025}, comment = {to appear in {ESWC25} Resource Track Proceedings}, }

"How do Scaling Laws Apply to Knowledge Graph Engineering Tasks? The Impact of Model Size on Large Language Model Performance" Desiree Heim et al. 2025, to appear in Proceedings of workshop ELMKE @ ESWC 2025.

@InProceedings{Heim2025ScalingLawsKgeTasks,
  author   = {Desiree Heim and Lars-Peter Meyer and Markus Schröder and Johannes Frey and Andreas Dengel},
  title    = {How do Scaling Laws Apply to Knowledge Graph Engineering Tasks? The Impact of Model Size on Large Language Model Performance},
  year     = {2025},
  comment  = {to appear in the proceedings of workshop {ELMKE} @ {ESWC} 2025},
}

submitted for review: "Evaluating Large Language Models for RDF Knowledge Graph Related Tasks - The LLM-KG-Bench-Framework 3" Lars-Peter Meyer et al. 2025, submitted for review at Semantic Web Journal: Article

Owner

Name: AKSW Research Group @ University of Leipzig
Login: AKSW
Kind: organization
Location: Leipzig

Website: http://aksw.org
Repositories: 358
Profile: https://github.com/AKSW

GitHub Events

Total

Create event: 5
Release event: 3
Issues event: 14
Watch event: 22
Delete event: 1
Issue comment event: 9
Push event: 143
Pull request event: 10
Fork event: 2

Last Year

Create event: 5
Release event: 3
Issues event: 14
Watch event: 22
Delete event: 1
Issue comment event: 9
Push event: 143
Pull request event: 10
Fork event: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 4
Total pull requests: 2
Average time to close issues: 7 months
Average time to close pull requests: 2 minutes
Total issue authors: 3
Total pull request authors: 1
Average comments per issue: 0.75
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 3
Pull requests: 2
Average time to close issues: 2 days
Average time to close pull requests: 2 minutes
Issue authors: 2
Pull request authors: 1
Average comments per issue: 0.67
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

lpmeyer (7)
nishadi (1)
JJ-Author (1)
white-gecko (1)

Pull Request Authors

lpmeyer (7)
deheim (1)

Top Labels

Issue Labels

enhancement (2) documentation (2) bug (1)

Pull Request Labels

Dependencies

poetry.lock pypi

aiohttp 3.8.5
aiosignal 1.3.1
anthropic 0.3.7
anyio 3.7.1
async-timeout 4.0.2
attrs 23.1.0
backoff 2.2.1
certifi 2023.7.22
charset-normalizer 3.2.0
colorama 0.4.6
contourpy 1.1.0
cycler 0.11.0
distro 1.8.0
exceptiongroup 1.1.2
fonttools 4.41.1
frozenlist 1.4.0
gitdb 4.0.10
gitpython 3.1.32
gpt4all 1.0.8
h11 0.14.0
httpcore 0.17.3
httpx 0.24.1
idna 3.4
importlib-resources 6.0.0
isodate 0.6.1
jinja2 3.1.2
kiwisolver 1.4.4
markupsafe 2.1.3
matplotlib 3.7.2
multidict 6.0.4
numpy 1.25.2
openai 0.27.8
packaging 23.1
pandas 2.0.3
pillow 10.0.0
pydantic 1.10.12
pyparsing 3.0.9
python-dateutil 2.8.2
pytz 2023.3
pyyaml 6.0.1
rdflib 6.3.2
requests 2.31.0
seaborn 0.12.2
six 1.16.0
smmap 5.0.0
sniffio 1.3.0
tokenizers 0.13.3
tqdm 4.65.0
typing-extensions 4.7.1
tzdata 2023.3
urllib3 2.0.4
yarl 1.9.2
zipp 3.16.2

pyproject.toml pypi

anthropic ^0.3.6
backoff ^2.2.1
gitpython ^3.1.32
gpt4all ^1.0.7
jinja2 ^3.0.1
matplotlib ^3.7.2
openai ^0.27.8
pandas ^2.0.3
python ^3.9
pyyaml ^6.0.1
rdflib ^6.3.2
seaborn ^0.12.2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/aksw/llm-kg-bench

Science Score: 49.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

LLM-KG-Bench

Requirements, Installation and execution

Requisites

installation

configure and execute benchmark

Result files generated

reevaluation on given result files

Plans and contribution possibilities

Test dataset, please do not use for training

Publications on LLM-KG-Bench and generated results

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies