sphm4kg

Repository for paper "Learning Interpretable Probabilistic Models and Schema Axioms for Knowledge Graphs"

https://github.com/ivandiliso/sphm4kg

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Repository for paper "Learning Interpretable Probabilistic Models and Schema Axioms for Knowledge Graphs"

Basic Info
  • Host: GitHub
  • Owner: ivandiliso
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 30.3 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 4
Created 9 months ago · Last pushed 7 months ago
Metadata Files
Readme Citation

README.md

Learning Interpretable Probabilistic Models and Schema Axioms for Knowledge Graphs

DOI GitHub License Python Version

Folder Structure

📁 data -> Ontologies (.owl) and their targets (.nt) 📁 onto 📁 onto_target 📁 raw -> Ontologies raw files, metadata, intermediate caches, preprocessing pipelines 📁 out -> Pickled output of models metrics on each target 📁 docs -> Documentation and additional experiment results 📁 experimental_results -> Additional experimental resuls on larger datasets 📁 statistical_evaluation -> Friedman and Nemenyi post-hoc test on all dataset, problems and models 📁 src -> Source code 📁 MBM -> Code regarding models, rule extraction and wrappers

Source Code Structure

``` 📁 src/HBM/models 📄 HB.py -> Hierarchical Model with Variational Bayes, EM and Gradient Descent, unified abstraction. 📄 MBNB.py -> Multivariate Bernouelli Naive Bayes Model 📄 rule_helpers.py -> Model wrappers for axiom extraction

📁 src/HBM 📄 ontology.py -> Ontology loading utility. Feature matrix contruction. Mapping from symbolic to vectorized formats

📁 src/ 📄 targetgenerator.py -> Artificial disjuctinve problem generator 📄 trainevaluate.py -> Train and evaluation script (all models and all targets of a selected dataset) 📄 axiomcompare.py -> Utility to compare target and extracted axioms 📄 utils.py -> Logging and printing utils 📄 statisticaltest.py -> Execute Friedman-Nemenyi test on cached results files 📄 utils.py -> Logging and printing utils ```

Project Details

Requirements

The project has been developed using Python 3.12.8, a pip freeze of the packages used are available in the requirements.txt file.

How to run experiments?

The target_evaluate.py Python code can be run with arguments, specifyng the dataset to be used, in order to ensure correct execution, execute the file in the src directory:

⚠ The dataset loading utility uses Owlready2 to automatically load the .owl files and run the Pellet reasoneer. The Owlready2 implementation requires specyfing the system JAVAEXE path. This is handled automatically for Mac and Linux users, the system used during the experimentations (if using the standard location for the JAVA executibles) for Windows user, we suggest to check if some modification of `src/MBN/ontology.py _setjava()` is necessary

bash cd /path_to_project/sphm4kg/src python3 train_evaluate.py --onto ("lubm", "financial", "ntnames", "krkrzeroone") # Choose one

Target Concept Ontologies

The onto_name.nt in data/onto_target refer to the artificially created disjucntive targets created for each ontology in oder to assess the probabilistic model prediction capabilities. These files contain two types of targets:

  • SimpleClassX: Target defined as the disjuction of simple concepts
  • HardClassX: Target defined as the disjuction of conjunctions of simple concepts

To load them without using the provided utility, this code can be used (lubm in this example):

```python from owlready2 import *

targetonto = getontology("./data/onto_target/lubm-t.nt").load()

simpletargets = set(targetonto.search(iri='#Simple_Class')) hardtargets = set(targetonto.search(iri='#Hard_Class')) ```

Feature Names

When working with the provided utility, you can see classes defined as:

  • namespace.ClassNaname: These refer to a class defined in the ontology
  • namespace.Some_relationName_range: These refer to existential restrictions on the relation range, formally defined as:

$$ SomerealationNamerange \equiv \exists relationName.Range(relationName) $$

Hyperparameters Settings

| Model | Parameters and Ranges | | - | - | | $\texttt{MBNB}$ | {}| | $\texttt{HB}{\texttt{VB}}$ | `{ncomponents = 5, ninit = 10, niter = 200}| | $\texttt{HB}_{\texttt{EM}}$ |{ncomponents = 5, maxiter = 200, tol=1e-3}| | $\texttt{HB}_{\texttt{GD}}$ |{ncomponents = 5, maxiter = 200, tol=1e-3, learningrate=1e-4}| | $\texttt{Tree}$ |{maxdepth=3, minsamplesleaf=10, criterion="logloss"}| | $\texttt{LogReg}$ |{C=0.01, penalty='l1', solver='saga', maxiter=200}| | $\texttt{HLogReg}$ |{ncomponents = 5, ninit = 10, niter = 200, C=0.01, penalty='l1', solver='saga', maxiter=200}| | $\texttt{AxiomWrapper}$ |{thetau : (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9), thetap : (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)}`|

Larger Ontologies Preprocessing

Larger ontologies required ad-hoc preprocessing to onder to generate a unique owl file adhering to Owlready2 internal structure and limitation (for example, DBpedia containes individuals that are also classes, this breaks the Owlready2 interal class inheritance scheme). For these reasons in the data/raw folder both raw ontolgies file and preprocessing pipelines are provided:

  • DBpedia50K: Refined version of the triples only version of this benchmark dataset, the preprocessing file creates a unique OWL file with entities, classes, object proprieties, domain and range axioms, class membership axioms and subclass axioms.
  • YAGO4-20: Please download the yago-wd-facts.nt and put it in the data/raw/yago4-20 to properly execute all preprocessing steps. Execute individuals_uri.py first and then the preprocessing pipelines to generate the yago OWL file.

All the already preprocessed dataset and artificial problems are already available in the project repository.

Owner

  • Name: Ivan Diliso
  • Login: ivandiliso
  • Kind: user
  • Location: Italy, Bari
  • Company: University of Bari Aldo Moro

PhD Student @ University of Bari Aldo Moro

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Diliso"
  given-names: "Ivan"
  orcid: "https://orcid.org/0009-0007-2942-202X"
- family-names: "Fanizzi"
  given-names: "Nicola"
  orcid: "https://orcid.org/0000-0001-5319-7933"
- family-names: "d'Amato"
  given-names: "Claudia"
  orcid: "https://orcid.org/0000-0002-3385-987X"
title: "Learning Interpretable Probabilistic Models and Schema Axioms for Knowledge Graphs"
version: 0.0.0.1
doi: https://doi.org/10.5281/zenodo.15708405
date-released: 
url: https://github.com/ivandiliso/sphm4kg

GitHub Events

Total
  • Release event: 2
  • Push event: 12
  • Public event: 1
  • Create event: 2
Last Year
  • Release event: 2
  • Push event: 12
  • Public event: 1
  • Create event: 2

Dependencies

requirements.txt pypi
  • Deprecated ==1.2.18
  • Flask ==3.1.0
  • GitPython ==3.1.44
  • Jinja2 ==3.1.5
  • Mako ==1.3.8
  • Markdown ==3.7
  • MarkupSafe ==3.0.2
  • PyYAML ==6.0.2
  • Pygments ==2.19.1
  • SQLAlchemy ==2.0.37
  • Werkzeug ==3.1.3
  • aiohappyeyeballs ==2.4.4
  • aiohttp ==3.11.11
  • aiosignal ==1.3.2
  • alembic ==1.14.1
  • annotated-types ==0.7.0
  • appnope ==0.1.4
  • asttokens ==3.0.0
  • attrs ==25.1.0
  • blinker ==1.9.0
  • cachetools ==5.5.2
  • certifi ==2025.1.31
  • charset-normalizer ==3.4.1
  • class_resolver ==0.5.4
  • click ==8.1.8
  • click-default-group ==1.2.4
  • cloudpickle ==3.1.1
  • colorlog ==6.9.0
  • comm ==0.2.2
  • contourpy ==1.3.1
  • cycler ==0.12.1
  • databricks-sdk ==0.44.1
  • dataclasses-json ==0.6.7
  • debugpy ==1.8.12
  • decorator ==5.1.1
  • docdata ==0.0.4
  • docker ==7.1.0
  • executing ==2.2.0
  • filelock ==3.17.0
  • fonttools ==4.55.8
  • frozenlist ==1.5.0
  • fsspec ==2025.2.0
  • gitdb ==4.0.12
  • google-auth ==2.38.0
  • graphene ==3.4.3
  • graphql-core ==3.2.6
  • graphql-relay ==3.2.0
  • gunicorn ==23.0.0
  • idna ==3.10
  • imbalanced-learn ==0.13.0
  • imblearn ==0.0
  • importlib_metadata ==8.5.0
  • ipykernel ==6.29.5
  • ipython ==8.32.0
  • ipywidgets ==8.1.5
  • itsdangerous ==2.2.0
  • jedi ==0.19.2
  • joblib ==1.4.2
  • jupyter_client ==8.6.3
  • jupyter_core ==5.7.2
  • jupyterlab_widgets ==3.0.13
  • kiwisolver ==1.4.8
  • marshmallow ==3.26.1
  • matplotlib ==3.10.0
  • matplotlib-inline ==0.1.7
  • mlflow ==2.20.3
  • mlflow-skinny ==2.20.3
  • more-click ==0.1.2
  • more-itertools ==10.6.0
  • mpmath ==1.3.0
  • multidict ==6.1.0
  • mypy-extensions ==1.0.0
  • nest-asyncio ==1.6.0
  • networkx ==3.4.2
  • numpy ==2.2.2
  • opentelemetry-api ==1.30.0
  • opentelemetry-sdk ==1.30.0
  • opentelemetry-semantic-conventions ==0.51b0
  • optuna ==4.2.0
  • owlready2 ==0.47
  • packaging ==24.2
  • pandas ==2.2.3
  • parso ==0.8.4
  • patsy ==1.0.1
  • pexpect ==4.9.0
  • pillow ==11.1.0
  • platformdirs ==4.3.6
  • prompt_toolkit ==3.0.50
  • propcache ==0.2.1
  • protobuf ==5.29.3
  • psutil ==6.1.1
  • ptyprocess ==0.7.0
  • pure_eval ==0.2.3
  • pyarrow ==19.0.1
  • pyasn1 ==0.6.1
  • pyasn1_modules ==0.4.1
  • pydantic ==2.10.6
  • pydantic_core ==2.27.2
  • pykeen ==1.11.0
  • pyparsing ==3.2.1
  • pystow ==0.7.0
  • python-dateutil ==2.9.0.post0
  • pytz ==2025.1
  • pyzmq ==26.2.1
  • rdflib ==7.1.3
  • requests ==2.32.3
  • rsa ==4.9
  • scikit-learn ==1.6.1
  • scikit-posthocs ==0.11.4
  • scipy ==1.15.1
  • seaborn ==0.13.2
  • setuptools ==75.8.0
  • six ==1.17.0
  • sklearn-compat ==0.1.3
  • smmap ==5.0.2
  • sqlparse ==0.5.3
  • stack-data ==0.6.3
  • statsmodels ==0.14.4
  • sympy ==1.13.1
  • tabulate ==0.9.0
  • threadpoolctl ==3.5.0
  • torch ==2.5.1
  • torch-geometric ==2.6.1
  • torch-max-mem ==0.1.3
  • torch-ppr ==0.0.8
  • torchaudio ==2.5.1
  • torchvision ==0.20.1
  • tornado ==6.4.2
  • tqdm ==4.67.1
  • traitlets ==5.14.3
  • typing-inspect ==0.9.0
  • typing_extensions ==4.12.2
  • tzdata ==2025.1
  • urllib3 ==2.3.0
  • wcwidth ==0.2.13
  • widgetsnbextension ==4.0.13
  • wittgenstein ==0.3.4
  • wrapt ==1.17.2
  • yarl ==1.18.3
  • zipp ==3.21.0