llm4ke
Repository for Large Language Models for Knowledge Engineering
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.7%) to scientific vocabulary
Repository
Repository for Large Language Models for Knowledge Engineering
Basic Info
- Host: GitHub
- Owner: D2KLab
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 1.27 MB
Statistics
- Stars: 6
- Watchers: 4
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
llm4ke
Repository for Large Language Models for Knowledge Engineering (LLM4KE).
Objectives
Original idea:
How much LLM could co-contribute in the knowledge engineering process together with our usual methodology (competency questions, ontology re-use, authoring tests, etc.).
Set of questions we could investigate:
- Could a LLM reverse engineer an ontology and find out what good competency questions could be derived?
- Could a LLM take as input the CQ and generate parts of the ontology?
- Could a LLM take as input the CQ and extend an existing ontology?
- Could a LLM take as input the CQ and generate abstract patterns?
- Could a LLM write an authoring test (a SPARQL query) given the ontology and the CQ?
- Given a dataset and an ontology, is an LLM able to generate an adequate set of RML rules for data ingestion?
- Could a LLM take as input the CQ and extend an existing ontology?
The content of this code repository accompanies the research project explained in the following papers:
```bibtex @inproceedings{llm4ke-2024, title = {{Can LLMs Generate Competency Questions?}}, author = {{Youssra Rebboud} and {Lionel Tailhardat} and {Pasquale Lisena} and {Rapha\"el Troncy}}, booktitle = {Semantic Web - 21st International Conference (ESWC), LLMs for KE track, Hersonissos, Crete, Greece, May 26 - 30, 2024}, year = {2024} }
@inproceedings{llm4ke-bench-2024, title = {{Benchmarking LLM-based Ontology Conceptualization: A Proposal}}, author = {{Youssra Rebboud} and {Pasquale Lisena} and {Lionel Tailhardat} and {Rapha\"el Troncy}}, booktitle = {ISWC 2024, 23rd International Semantic Web Conference, 11-15 November 2024, Baltimore, USA}, year = {2024} } ```
Usage
See the Repository Structure for navigating into this repository:
llm4ke
├───data <Reference data models with their related components>
│ └─[DataModelName]
│ ├─dm <data model implementation>
│ ├─rq <set of queries>
│ └─...
├───src <Processing pipeline code>
└───...
Generating Competency Questions
We will now address the research question "1. Could a LLM reverse engineer an ontology and identify potential competency questions?" mentioned above.
The pipeline uses LangChain, and in particular Ollama.
- Install Ollama from its website.
- Install requirements
shell pip install -r requirements.txt - Download the desidered LLM (full list of available LLMs)
shell ollama pull llama2 - Run the pipeline to generate Competency Questions for a given ontology
```shell
# Canonical form:
# python src/main.py
--name --input --llm
# Basic example for the Odeuropa ontology:
python src/main.py all_classes --name Odeuropa --input ./data/Odeuropa/ --llm llama2
``
Then browse the results in theout/Odeuropa/directory.
You can get the full list of available parameters withpython src/main.py --help`
Evaluating the LLM's Competency Questions
With the output data from the above Generating Competency Questions step,
- Run the evaluation pipeline to compute similarity scores for all ontologies or a given ontology
```shell
# Canonical form:
# python src/eval.py
# Basic example for the Odeuropa ontology with a 0.8 similarity threshold and verbose logging:
python3 ./src/eval.py Odeuropa -t 0.8 --log 10
``
Then browse the results in the./results_
Copyright
Copyright (c) 2023-2024, EURECOM. All rights reserved.
License
Maintainer
Owner
- Name: D2K Lab
- Login: D2KLab
- Kind: organization
- Email: d2klab-admin@eurecom.fr
- Location: Turin, Sophia Antipolis
- Repositories: 42
- Profile: https://github.com/D2KLab
Data to Knowledge Virtual Lab (LINKS Foundation - EURECOM)
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
llm4ke: Large Language Models for Knowledge Engineering
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- orcid: 'https://orcid.org/0000-0003-3507-5646'
affiliation: EURECOM
given-names: Rebboud
family-names: Youssra
- orcid: 'https://orcid.org/0000-0001-5887-899X'
affiliation: Orange
given-names: Lionel
family-names: Tailhardat
- orcid: 'https://orcid.org/0000-0003-3094-5585'
affiliation: EURECOM
given-names: Pasquale
family-names: Lisena
- orcid: 'https://orcid.org/0000-0003-0457-1436'
affiliation: EURECOM
given-names: Raphaël
family-names: Troncy
repository-code: 'https://github.com/D2KLab/llm4ke'
url: 'https://semantics.eurecom.fr/'
abstract: >-
llm4ke: Large Language Models for Knowledge Engineering
keywords:
- ontology
- LLM
- competency questions
license: Apache
version: v0.0.1
date-released: '2023-01-05'
preferred-citation:
type: conference-paper
authors:
- orcid: 'https://orcid.org/0000-0003-3507-5646'
affiliation: EURECOM
given-names: Rebboud
family-names: Youssra
- orcid: 'https://orcid.org/0000-0001-5887-899X'
affiliation: Orange
given-names: Lionel
family-names: Tailhardat
- orcid: 'https://orcid.org/0000-0003-3094-5585'
affiliation: EURECOM
given-names: Pasquale
family-names: Lisena
- orcid: 'https://orcid.org/0000-0003-0457-1436'
affiliation: EURECOM
given-names: Raphaël
family-names: Troncy
journal: "Semantic Web - 21st International Conference (ESWC), LLMs for KE track, Hersonissos, Crete, Greece, May 26 - 30, 2024"
title: "Can LLMs Generate Competency Questions?"
year: 2024
GitHub Events
Total
- Watch event: 4
- Push event: 1
Last Year
- Watch event: 4
- Push event: 1
Dependencies
- argparse *
- langchain *
- langchain_community *
- openai *
- pyyaml *
- rdflib *
- sentence_transformers *
- transformers *
- wandb *