https://github.com/cefriel/procedural-kg-llm

Prompt-based pipeline for extracting procedural knowledge graphs from text with LLMs

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.4%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Prompt-based pipeline for extracting procedural knowledge graphs from text with LLMs

Basic Info

Host: GitHub
Owner: cefriel
License: apache-2.0
Language: Jupyter Notebook
Default Branch: main
Size: 215 KB

Statistics

Stars: 5
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License

Procedural Knowledge Graph extraction from Text with Large Language Models

We propose a prompt-based pipeline for extracting procedural knowledge graphs from text with LLMs.

This pipeline extracts steps, actions, objects, equipment and temporal information from a textual procedure, in order to populate a Procedural KG according to a pre-defined ontology.

Experimental setting

For our experiments, we: - used the GPT 4o model - set the temperature parameter to 0 - rely on the LangChain framework

Procedures used in the prompt engineering refinement process, and in the evaluation, are selected from WikiHow

We reuse this JSON dataset available on GitHub

How to navigate this repository

pkg-extraction / notebooks

This folder contains: - pkg-extraction.ipynb, the notebook with the pipeline of 2 prompts - a subfolder preliminary-experiments containing the notebooks with our preliminary experiments

The repository defines a docker-compose.yml file to run the Jupyter notebooks as containers via Docker. The containers can be run all at once or separately.

The notebooks can be executed running the container, from the folder with the .yml file, with the command: docker-compose up --force-recreate

A credentials.json file should be provided in the main folder with a valid key for the OpenAI API.

{ "OPENAI_API_KEY": "PUT_HERE_YOUR_KEY" }

data-results

ontology: this folder contains the procedural ontology used as reference in the experiments
clean-flat-panel-monitor, fix-rubbing-door, cook-honey-glazed-parsnips, plant-bare-root-tree: these folders contain input and output data for the 4 procedures
preliminary-experiments: this folder contains the results of previous experiments during the prompt engineering refinement process

human-assessment

This folder contains: - materials and results from the human assessment of the LLM results - a subfolder preliminary-experiments containing the materials and results from the human assessment of our preliminary experiments

Contributing

Before contributing, please read carefully, complete and sign our Contributor Licence Agreement.

When contributing to this repository, please first discuss the change you wish to make via issue or any other available method with the repository's owners.

Owner

Name: Cefriel
Login: cefriel
Kind: organization
Email: info@cefriel.com
Location: Milano, Italy

Website: https://www.cefriel.com
Repositories: 15
Profile: https://github.com/cefriel

GitHub Events

Total

Watch event: 7
Member event: 1
Fork event: 1

Last Year

Watch event: 7
Member event: 1
Fork event: 1

Dependencies

pkg-extraction/docker-compose.yml docker

jupyter/datascience-notebook latest

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/cefriel/procedural-kg-llm

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Procedural Knowledge Graph extraction from Text with Large Language Models

Experimental setting

How to navigate this repository

pkg-extraction / notebooks

data-results

human-assessment

Contributing

Owner

GitHub Events

Total

Last Year

Dependencies