llm4ke

Repository for Large Language Models for Knowledge Engineering

https://github.com/d2klab/llm4ke

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.7%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Repository for Large Language Models for Knowledge Engineering

Basic Info

Host: GitHub
Owner: D2KLab
License: apache-2.0
Language: Python
Default Branch: main
Size: 1.27 MB

Statistics

Stars: 6
Watchers: 4
Forks: 0
Open Issues: 0
Releases: 1

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

llm4ke

Repository for Large Language Models for Knowledge Engineering (LLM4KE).

Objectives

Original idea:

How much LLM could co-contribute in the knowledge engineering process together with our usual methodology (competency questions, ontology re-use, authoring tests, etc.).

Set of questions we could investigate:

Could a LLM reverse engineer an ontology and find out what good competency questions could be derived?
Could a LLM take as input the CQ and generate parts of the ontology?
Could a LLM take as input the CQ and extend an existing ontology?
Could a LLM take as input the CQ and generate abstract patterns?
Could a LLM write an authoring test (a SPARQL query) given the ontology and the CQ?
Given a dataset and an ontology, is an LLM able to generate an adequate set of RML rules for data ingestion?
Could a LLM take as input the CQ and extend an existing ontology?

The content of this code repository accompanies the research project explained in the following papers:

```bibtex @inproceedings{llm4ke-2024, title = {{Can LLMs Generate Competency Questions?}}, author = {{Youssra Rebboud} and {Lionel Tailhardat} and {Pasquale Lisena} and {Rapha\"el Troncy}}, booktitle = {Semantic Web - 21st International Conference (ESWC), LLMs for KE track, Hersonissos, Crete, Greece, May 26 - 30, 2024}, year = {2024} }

@inproceedings{llm4ke-bench-2024, title = {{Benchmarking LLM-based Ontology Conceptualization: A Proposal}}, author = {{Youssra Rebboud} and {Pasquale Lisena} and {Lionel Tailhardat} and {Rapha\"el Troncy}}, booktitle = {ISWC 2024, 23rd International Semantic Web Conference, 11-15 November 2024, Baltimore, USA}, year = {2024} } ```

Usage

See the Repository Structure for navigating into this repository:

llm4ke ├───data <Reference data models with their related components> │ └─[DataModelName] │ ├─dm <data model implementation> │ ├─rq <set of queries> │ └─... ├───src <Processing pipeline code> └───...

Generating Competency Questions

We will now address the research question "1. Could a LLM reverse engineer an ontology and identify potential competency questions?" mentioned above.

The pipeline uses LangChain, and in particular Ollama.

Install Ollama from its website.
Install requirements shell pip install -r requirements.txt
Download the desidered LLM (full list of available LLMs) shell ollama pull llama2
Run the pipeline to generate Competency Questions for a given ontology ```shell # Canonical form: # python src/main.py --name --input --llm

# Basic example for the Odeuropa ontology: python src/main.py all_classes --name Odeuropa --input ./data/Odeuropa/ --llm llama2 ``Then browse the results in theout/Odeuropa/directory. You can get the full list of available parameters withpython src/main.py --help`

Evaluating the LLM's Competency Questions

With the output data from the above Generating Competency Questions step,

Run the evaluation pipeline to compute similarity scores for all ontologies or a given ontology ```shell # Canonical form: # python src/eval.py

# Basic example for the Odeuropa ontology with a 0.8 similarity threshold and verbose logging: python3 ./src/eval.py Odeuropa -t 0.8 --log 10 ``Then browse the results in the./results_.json/` file.

Copyright

License

Apache.

Maintainer

Owner

Name: D2K Lab
Login: D2KLab
Kind: organization
Email: d2klab-admin@eurecom.fr
Location: Turin, Sophia Antipolis

Repositories: 42
Profile: https://github.com/D2KLab

Data to Knowledge Virtual Lab (LINKS Foundation - EURECOM)

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  llm4ke: Large Language Models for Knowledge Engineering
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - orcid: 'https://orcid.org/0000-0003-3507-5646'
    affiliation: EURECOM
    given-names: Rebboud
    family-names: Youssra
  - orcid: 'https://orcid.org/0000-0001-5887-899X'
    affiliation: Orange
    given-names: Lionel
    family-names: Tailhardat
  - orcid: 'https://orcid.org/0000-0003-3094-5585'
    affiliation: EURECOM
    given-names: Pasquale
    family-names: Lisena
  - orcid: 'https://orcid.org/0000-0003-0457-1436'
    affiliation: EURECOM
    given-names: Raphaël
    family-names: Troncy
repository-code: 'https://github.com/D2KLab/llm4ke'
url: 'https://semantics.eurecom.fr/'
abstract: >-
  llm4ke: Large Language Models for Knowledge Engineering
keywords:
  - ontology
  - LLM
  - competency questions
license: Apache
version: v0.0.1
date-released: '2023-01-05'
preferred-citation:
  type: conference-paper
  authors:
  - orcid: 'https://orcid.org/0000-0003-3507-5646'
    affiliation: EURECOM
    given-names: Rebboud
    family-names: Youssra
  - orcid: 'https://orcid.org/0000-0001-5887-899X'
    affiliation: Orange
    given-names: Lionel
    family-names: Tailhardat
  - orcid: 'https://orcid.org/0000-0003-3094-5585'
    affiliation: EURECOM
    given-names: Pasquale
    family-names: Lisena
  - orcid: 'https://orcid.org/0000-0003-0457-1436'
    affiliation: EURECOM
    given-names: Raphaël
    family-names: Troncy
  journal: "Semantic Web - 21st International Conference (ESWC), LLMs for KE track, Hersonissos, Crete, Greece, May 26 - 30, 2024"
  title: "Can LLMs Generate Competency Questions?"
  year: 2024

GitHub Events

Total

Watch event: 4
Push event: 1

Last Year

Watch event: 4
Push event: 1

Dependencies

requirements.txt pypi

argparse *
langchain *
langchain_community *
openai *
pyyaml *
rdflib *
sentence_transformers *
transformers *
wandb *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

llm4ke

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

llm4ke

Objectives

Usage

Generating Competency Questions

Evaluating the LLM's Competency Questions

Copyright

License

Maintainer

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies