sphm4kg

Repository for paper "Learning Interpretable Probabilistic Models and Schema Axioms for Knowledge Graphs"

https://github.com/ivandiliso/sphm4kg

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Repository for paper "Learning Interpretable Probabilistic Models and Schema Axioms for Knowledge Graphs"

Basic Info

Host: GitHub
Owner: ivandiliso
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 30.3 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 4

Created 12 months ago · Last pushed 10 months ago

Metadata Files

Readme Citation

Learning Interpretable Probabilistic Models and Schema Axioms for Knowledge Graphs

GitHub License

Folder Structure

📁 data -> Ontologies (.owl) and their targets (.nt) 📁 onto 📁 onto_target 📁 raw -> Ontologies raw files, metadata, intermediate caches, preprocessing pipelines 📁 out -> Pickled output of models metrics on each target 📁 docs -> Documentation and additional experiment results 📁 experimental_results -> Additional experimental resuls on larger datasets 📁 statistical_evaluation -> Friedman and Nemenyi post-hoc test on all dataset, problems and models 📁 src -> Source code 📁 MBM -> Code regarding models, rule extraction and wrappers

Source Code Structure

``` 📁 src/HBM/models 📄 HB.py -> Hierarchical Model with Variational Bayes, EM and Gradient Descent, unified abstraction. 📄 MBNB.py -> Multivariate Bernouelli Naive Bayes Model 📄 rule_helpers.py -> Model wrappers for axiom extraction

📁 src/HBM 📄 ontology.py -> Ontology loading utility. Feature matrix contruction. Mapping from symbolic to vectorized formats

📁 src/ 📄 targetgenerator.py -> Artificial disjuctinve problem generator 📄 trainevaluate.py -> Train and evaluation script (all models and all targets of a selected dataset) 📄 axiomcompare.py -> Utility to compare target and extracted axioms 📄 utils.py -> Logging and printing utils 📄 statisticaltest.py -> Execute Friedman-Nemenyi test on cached results files 📄 utils.py -> Logging and printing utils ```

Project Details

Requirements

The project has been developed using Python 3.12.8, a pip freeze of the packages used are available in the requirements.txt file.

How to run experiments?

The target_evaluate.py Python code can be run with arguments, specifyng the dataset to be used, in order to ensure correct execution, execute the file in the src directory:

⚠ The dataset loading utility uses Owlready2 to automatically load the .owl files and run the Pellet reasoneer. The Owlready2 implementation requires specyfing the system JAVAEXE path. This is handled automatically for Mac and Linux users, the system used during the experimentations (if using the standard location for the JAVA executibles) for Windows user, we suggest to check if some modification of `src/MBN/ontology.py _setjava()` is necessary

bash cd /path_to_project/sphm4kg/src python3 train_evaluate.py --onto ("lubm", "financial", "ntnames", "krkrzeroone") # Choose one

Target Concept Ontologies

The onto_name.nt in data/onto_target refer to the artificially created disjucntive targets created for each ontology in oder to assess the probabilistic model prediction capabilities. These files contain two types of targets:

SimpleClassX: Target defined as the disjuction of simple concepts
HardClassX: Target defined as the disjuction of conjunctions of simple concepts

To load them without using the provided utility, this code can be used (lubm in this example):

```python from owlready2 import *

targetonto = getontology("./data/onto_target/lubm-t.nt").load()

simpletargets = set(targetonto.search(iri='#Simple_Class')) hardtargets = set(targetonto.search(iri='#Hard_Class')) ```

Feature Names

When working with the provided utility, you can see classes defined as:

namespace.ClassNaname: These refer to a class defined in the ontology
namespace.Some_relationName_range: These refer to existential restrictions on the relation range, formally defined as:

$$ SomerealationNamerange \equiv \exists relationName.Range(relationName) $$

Hyperparameters Settings

| Model | Parameters and Ranges | | - | - | | $\texttt{MBNB}$ | {}| | $\texttt{HB}{\texttt{VB}}$ | `{ncomponents = 5, ninit = 10, niter = 200}| | $\texttt{HB}_{\texttt{EM}}$ |{ncomponents = 5, maxiter = 200, tol=1e-3}| | $\texttt{HB}_{\texttt{GD}}$ |{ncomponents = 5, maxiter = 200, tol=1e-3, learningrate=1e-4}| | $\texttt{Tree}$ |{maxdepth=3, minsamplesleaf=10, criterion="logloss"}| | $\texttt{LogReg}$ |{C=0.01, penalty='l1', solver='saga', maxiter=200}| | $\texttt{HLogReg}$ |{ncomponents = 5, ninit = 10, niter = 200, C=0.01, penalty='l1', solver='saga', maxiter=200}| | $\texttt{AxiomWrapper}$ |{thetau : (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9), thetap : (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)}`|

Larger Ontologies Preprocessing

Larger ontologies required ad-hoc preprocessing to onder to generate a unique owl file adhering to Owlready2 internal structure and limitation (for example, DBpedia containes individuals that are also classes, this breaks the Owlready2 interal class inheritance scheme). For these reasons in the data/raw folder both raw ontolgies file and preprocessing pipelines are provided:

DBpedia50K: Refined version of the triples only version of this benchmark dataset, the preprocessing file creates a unique OWL file with entities, classes, object proprieties, domain and range axioms, class membership axioms and subclass axioms.
YAGO4-20: Please download the yago-wd-facts.nt and put it in the data/raw/yago4-20 to properly execute all preprocessing steps. Execute individuals_uri.py first and then the preprocessing pipelines to generate the yago OWL file.

All the already preprocessed dataset and artificial problems are already available in the project repository.

Owner

Name: Ivan Diliso
Login: ivandiliso
Kind: user
Location: Italy, Bari
Company: University of Bari Aldo Moro

Website: ivandiliso.github.io
Repositories: 2
Profile: https://github.com/ivandiliso

PhD Student @ University of Bari Aldo Moro

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Diliso"
  given-names: "Ivan"
  orcid: "https://orcid.org/0009-0007-2942-202X"
- family-names: "Fanizzi"
  given-names: "Nicola"
  orcid: "https://orcid.org/0000-0001-5319-7933"
- family-names: "d'Amato"
  given-names: "Claudia"
  orcid: "https://orcid.org/0000-0002-3385-987X"
title: "Learning Interpretable Probabilistic Models and Schema Axioms for Knowledge Graphs"
version: 0.0.0.1
doi: https://doi.org/10.5281/zenodo.15708405
date-released: 
url: https://github.com/ivandiliso/sphm4kg

GitHub Events

Total

Release event: 2
Push event: 12
Public event: 1
Create event: 2

Last Year

Release event: 2
Push event: 12
Public event: 1
Create event: 2

Dependencies

requirements.txt pypi

Deprecated ==1.2.18
Flask ==3.1.0
GitPython ==3.1.44
Jinja2 ==3.1.5
Mako ==1.3.8
Markdown ==3.7
MarkupSafe ==3.0.2
PyYAML ==6.0.2
Pygments ==2.19.1
SQLAlchemy ==2.0.37
Werkzeug ==3.1.3
aiohappyeyeballs ==2.4.4
aiohttp ==3.11.11
aiosignal ==1.3.2
alembic ==1.14.1
annotated-types ==0.7.0
appnope ==0.1.4
asttokens ==3.0.0
attrs ==25.1.0
blinker ==1.9.0
cachetools ==5.5.2
certifi ==2025.1.31
charset-normalizer ==3.4.1
class_resolver ==0.5.4
click ==8.1.8
click-default-group ==1.2.4
cloudpickle ==3.1.1
colorlog ==6.9.0
comm ==0.2.2
contourpy ==1.3.1
cycler ==0.12.1
databricks-sdk ==0.44.1
dataclasses-json ==0.6.7
debugpy ==1.8.12
decorator ==5.1.1
docdata ==0.0.4
docker ==7.1.0
executing ==2.2.0
filelock ==3.17.0
fonttools ==4.55.8
frozenlist ==1.5.0
fsspec ==2025.2.0
gitdb ==4.0.12
google-auth ==2.38.0
graphene ==3.4.3
graphql-core ==3.2.6
graphql-relay ==3.2.0
gunicorn ==23.0.0
idna ==3.10
imbalanced-learn ==0.13.0
imblearn ==0.0
importlib_metadata ==8.5.0
ipykernel ==6.29.5
ipython ==8.32.0
ipywidgets ==8.1.5
itsdangerous ==2.2.0
jedi ==0.19.2
joblib ==1.4.2
jupyter_client ==8.6.3
jupyter_core ==5.7.2
jupyterlab_widgets ==3.0.13
kiwisolver ==1.4.8
marshmallow ==3.26.1
matplotlib ==3.10.0
matplotlib-inline ==0.1.7
mlflow ==2.20.3
mlflow-skinny ==2.20.3
more-click ==0.1.2
more-itertools ==10.6.0
mpmath ==1.3.0
multidict ==6.1.0
mypy-extensions ==1.0.0
nest-asyncio ==1.6.0
networkx ==3.4.2
numpy ==2.2.2
opentelemetry-api ==1.30.0
opentelemetry-sdk ==1.30.0
opentelemetry-semantic-conventions ==0.51b0
optuna ==4.2.0
owlready2 ==0.47
packaging ==24.2
pandas ==2.2.3
parso ==0.8.4
patsy ==1.0.1
pexpect ==4.9.0
pillow ==11.1.0
platformdirs ==4.3.6
prompt_toolkit ==3.0.50
propcache ==0.2.1
protobuf ==5.29.3
psutil ==6.1.1
ptyprocess ==0.7.0
pure_eval ==0.2.3
pyarrow ==19.0.1
pyasn1 ==0.6.1
pyasn1_modules ==0.4.1
pydantic ==2.10.6
pydantic_core ==2.27.2
pykeen ==1.11.0
pyparsing ==3.2.1
pystow ==0.7.0
python-dateutil ==2.9.0.post0
pytz ==2025.1
pyzmq ==26.2.1
rdflib ==7.1.3
requests ==2.32.3
rsa ==4.9
scikit-learn ==1.6.1
scikit-posthocs ==0.11.4
scipy ==1.15.1
seaborn ==0.13.2
setuptools ==75.8.0
six ==1.17.0
sklearn-compat ==0.1.3
smmap ==5.0.2
sqlparse ==0.5.3
stack-data ==0.6.3
statsmodels ==0.14.4
sympy ==1.13.1
tabulate ==0.9.0
threadpoolctl ==3.5.0
torch ==2.5.1
torch-geometric ==2.6.1
torch-max-mem ==0.1.3
torch-ppr ==0.0.8
torchaudio ==2.5.1
torchvision ==0.20.1
tornado ==6.4.2
tqdm ==4.67.1
traitlets ==5.14.3
typing-inspect ==0.9.0
typing_extensions ==4.12.2
tzdata ==2025.1
urllib3 ==2.3.0
wcwidth ==0.2.13
widgetsnbextension ==4.0.13
wittgenstein ==0.3.4
wrapt ==1.17.2
yarl ==1.18.3
zipp ==3.21.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science