ontogenix
A semi-automated system based on LLM's to generate ontologies from datasets
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.9%) to scientific vocabulary
Repository
A semi-automated system based on LLM's to generate ontologies from datasets
Basic Info
- Host: GitHub
- Owner: tecnomod-um
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Size: 88.8 MB
Statistics
- Stars: 10
- Watchers: 3
- Forks: 3
- Open Issues: 7
- Releases: 2
Metadata Files
README.md
OntoGenix
The project utilizes the OpenAI GPT-4 model to develop a semi-automatic system that generates OWL ontologies and RML mappings from CSV datasets using LLMs.
Important: Access to GPT-4 is required so you need to create an account and ask for acces to this model.
GUI

Installation
```bash git clone repository_URL (not included due to anonymization)
cd Ontogenix
pip install -r requirements.txt
Create a .env file inside GUI directory containing the openai api_key
touch ./GUI/.env
write your openai api_key in .env file
OPENAIAPIKEY="your-api-key" ```
Execution
bash
python -m GUI
GUI Instructions
Check out the video tutorial: OntoGenix.mp4
These are the main steps involved in the generation process of an ontology design withing OntoGenix.
1. LOAD CSV
- Purpose: This step focuses on importing the CSV dataset and subsequently producing a comprehensive statistical analysis of the dataset.
2. PROMPT CRAFTING
- Purpose: A step towards the generation of the prompt that guides to the objective high level structure description of the ontology.
3. ONTOLOGY GENERATION
- Purpose: The main objective here is to construct the ontology. This will be done through a sequence of structured sub-steps:
3.1. DESCRIPTION
Purpose: Two primary tasks will be carried out:
- Provide a concise summary of the dataset using natural language.
- Offer a high-level summary of the suggested ontology design architecture.
Options:
- Allow the model to suggest a structure.
- Manually provide a prompt to specify the ontology's overarching structure.
3.2. ONTOLOGY
Purpose: This step is pivotal for creating the ontology's OWL definition, with a spotlight on classes and their associated object and data properties.
Note: This process is automatic and doesn't require a prompt.
3.3. ONTOLOGY_ENTITY
Purpose: Tailored to formulating the ontology's OWL definition for specific entities.
Important: A carefully crafted prompt that vividly describes the task is essential for this step.
Examples for Crafting Prompts:
Example 1: Improving a Class
prompt:
Ensure you meticulously follow the guidelines laid out in the data description.
Explicitly reference all object and data properties linked to the entity in question.
Object property restrictions? Always use the "onClass" parameter.
Data type property restrictions? The "onDataRange" parameter is your go-to.
All the above-mentioned tasks should be executed for the entity: {EntityName}
Example 2: Enhancing Object/Data Type Properties
prompt:
Enrich the entity with metadata and annotations to provide context, trace origin, and deliver deeper insights.
Retain the foundational name of the entity.
Include a descriptive field. Consider suggesting an alternative name and provide up to five alternative labels for enhanced clarity and versatility.
If there are any known equivalent properties in the ontology, be sure to define them. Steer clear of inventing fictional ones.
All these tasks are to be done for the entity: {EntityName}
4. MAPPING
Purpose: In this phase, a mapping is created by making use of the ontology in conjunction with the original CSV dataset.
Note: This process is automatic and doesn't require a prompt.
GUI Docs
For more information on the project's design, see the Design Documentation.
Experimental evaluation
We have carried out an experiment comparing the ontologies generated by Ontogenix and the ones developed by humans. The results are available in the OntoGenix Evaluation repository.
Owner
- Name: tecnomod-um
- Login: tecnomod-um
- Kind: organization
- Repositories: 1
- Profile: https://github.com/tecnomod-um
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: OntoGenix
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Mikel
family-names: Val Calvo
email: mikel1982mail@gmail.com
affiliation: University of Murcia
- given-names: Mikel
family-names: Egaña Aranguren
email: mikel.egana@ehu.eus
affiliation: University of Basque Country (UPV/EHU)
orcid: 'https://orcid.org/0000-0001-8081-1839'
- given-names: Jesualdo Tomas
family-names: Fernandez Breis
email: jfernand@um.es
affiliation: University of Murcia
identifiers:
- type: doi
value: 10.5281/zenodo.13468051
description: Zenodo DOI
repository-code: 'https://github.com/tecnomod-um/OntoGenix'
abstract: >-
The project utilizes the OpenAI GPT-4 model to develop a
semi-automatic system that generates OWL ontologies and
RML mappings from CSV datasets using LLMs.
keywords:
- OWL
- LLM
- RDF
- RML
license: GPL-3.0
commit: 9bf3c435656e4bf3a50e89604df7ab2ca2b8abca
version: '1.2'
date-released: '2024-08-29'
GitHub Events
Total
- Issues event: 2
- Watch event: 11
- Issue comment event: 2
- Push event: 2
- Fork event: 1
Last Year
- Issues event: 2
- Watch event: 11
- Issue comment event: 2
- Push event: 2
- Fork event: 1
Dependencies
- PyQt5 *
- lxml *
- markdown *
- morph-kgc *
- numpy *
- openai *
- pandas *
- python-dotenv *
- pyyaml *
- rdflib *
- sentence_transformers *
- sklearn *