llm4ke

Repository for Large Language Models for Knowledge Engineering

https://github.com/d2klab/llm4ke

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Repository for Large Language Models for Knowledge Engineering

Basic Info
  • Host: GitHub
  • Owner: D2KLab
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 1.27 MB
Statistics
  • Stars: 6
  • Watchers: 4
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

llm4ke

Repository for Large Language Models for Knowledge Engineering (LLM4KE).

Objectives

Original idea:

How much LLM could co-contribute in the knowledge engineering process together with our usual methodology (competency questions, ontology re-use, authoring tests, etc.).

Set of questions we could investigate:

  1. Could a LLM reverse engineer an ontology and find out what good competency questions could be derived?
  2. Could a LLM take as input the CQ and generate parts of the ontology?
  3. Could a LLM take as input the CQ and extend an existing ontology?
  4. Could a LLM take as input the CQ and generate abstract patterns?
  5. Could a LLM write an authoring test (a SPARQL query) given the ontology and the CQ?
  6. Given a dataset and an ontology, is an LLM able to generate an adequate set of RML rules for data ingestion?
  7. Could a LLM take as input the CQ and extend an existing ontology?

The content of this code repository accompanies the research project explained in the following papers:

```bibtex @inproceedings{llm4ke-2024, title = {{Can LLMs Generate Competency Questions?}}, author = {{Youssra Rebboud} and {Lionel Tailhardat} and {Pasquale Lisena} and {Rapha\"el Troncy}}, booktitle = {Semantic Web - 21st International Conference (ESWC), LLMs for KE track, Hersonissos, Crete, Greece, May 26 - 30, 2024}, year = {2024} }

@inproceedings{llm4ke-bench-2024, title = {{Benchmarking LLM-based Ontology Conceptualization: A Proposal}}, author = {{Youssra Rebboud} and {Pasquale Lisena} and {Lionel Tailhardat} and {Rapha\"el Troncy}}, booktitle = {ISWC 2024, 23rd International Semantic Web Conference, 11-15 November 2024, Baltimore, USA}, year = {2024} } ```

Usage

See the Repository Structure for navigating into this repository:

llm4ke ├───data <Reference data models with their related components> │ └─[DataModelName] │ ├─dm <data model implementation> │ ├─rq <set of queries> │ └─... ├───src <Processing pipeline code> └───...

Generating Competency Questions

We will now address the research question "1. Could a LLM reverse engineer an ontology and identify potential competency questions?" mentioned above.

The pipeline uses LangChain, and in particular Ollama.

  • Install Ollama from its website.
  • Install requirements shell pip install -r requirements.txt
  • Download the desidered LLM (full list of available LLMs) shell ollama pull llama2
  • Run the pipeline to generate Competency Questions for a given ontology ```shell # Canonical form: # python src/main.py --name --input --llm

# Basic example for the Odeuropa ontology: python src/main.py all_classes --name Odeuropa --input ./data/Odeuropa/ --llm llama2 `` Then browse the results in theout/Odeuropa/directory. You can get the full list of available parameters withpython src/main.py --help`

Evaluating the LLM's Competency Questions

With the output data from the above Generating Competency Questions step,

  • Run the evaluation pipeline to compute similarity scores for all ontologies or a given ontology ```shell # Canonical form: # python src/eval.py

# Basic example for the Odeuropa ontology with a 0.8 similarity threshold and verbose logging: python3 ./src/eval.py Odeuropa -t 0.8 --log 10 `` Then browse the results in the./results_.json/` file.

Copyright

Copyright (c) 2023-2024, EURECOM. All rights reserved.

License

Apache.

Maintainer

Owner

  • Name: D2K Lab
  • Login: D2KLab
  • Kind: organization
  • Email: d2klab-admin@eurecom.fr
  • Location: Turin, Sophia Antipolis

Data to Knowledge Virtual Lab (LINKS Foundation - EURECOM)

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  llm4ke: Large Language Models for Knowledge Engineering
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - orcid: 'https://orcid.org/0000-0003-3507-5646'
    affiliation: EURECOM
    given-names: Rebboud
    family-names: Youssra
  - orcid: 'https://orcid.org/0000-0001-5887-899X'
    affiliation: Orange
    given-names: Lionel
    family-names: Tailhardat
  - orcid: 'https://orcid.org/0000-0003-3094-5585'
    affiliation: EURECOM
    given-names: Pasquale
    family-names: Lisena
  - orcid: 'https://orcid.org/0000-0003-0457-1436'
    affiliation: EURECOM
    given-names: Raphaël
    family-names: Troncy
repository-code: 'https://github.com/D2KLab/llm4ke'
url: 'https://semantics.eurecom.fr/'
abstract: >-
  llm4ke: Large Language Models for Knowledge Engineering
keywords:
  - ontology
  - LLM
  - competency questions
license: Apache
version: v0.0.1
date-released: '2023-01-05'
preferred-citation:
  type: conference-paper
  authors:
  - orcid: 'https://orcid.org/0000-0003-3507-5646'
    affiliation: EURECOM
    given-names: Rebboud
    family-names: Youssra
  - orcid: 'https://orcid.org/0000-0001-5887-899X'
    affiliation: Orange
    given-names: Lionel
    family-names: Tailhardat
  - orcid: 'https://orcid.org/0000-0003-3094-5585'
    affiliation: EURECOM
    given-names: Pasquale
    family-names: Lisena
  - orcid: 'https://orcid.org/0000-0003-0457-1436'
    affiliation: EURECOM
    given-names: Raphaël
    family-names: Troncy
  journal: "Semantic Web - 21st International Conference (ESWC), LLMs for KE track, Hersonissos, Crete, Greece, May 26 - 30, 2024"
  title: "Can LLMs Generate Competency Questions?"
  year: 2024

GitHub Events

Total
  • Watch event: 4
  • Push event: 1
Last Year
  • Watch event: 4
  • Push event: 1

Dependencies

requirements.txt pypi
  • argparse *
  • langchain *
  • langchain_community *
  • openai *
  • pyyaml *
  • rdflib *
  • sentence_transformers *
  • transformers *
  • wandb *