sphm4kg
Repository for paper "Learning Interpretable Probabilistic Models and Schema Axioms for Knowledge Graphs"
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary
Repository
Repository for paper "Learning Interpretable Probabilistic Models and Schema Axioms for Knowledge Graphs"
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 4
Metadata Files
README.md
Learning Interpretable Probabilistic Models and Schema Axioms for Knowledge Graphs
Folder Structure
📁 data -> Ontologies (.owl) and their targets (.nt)
📁 onto
📁 onto_target
📁 raw -> Ontologies raw files, metadata, intermediate caches, preprocessing pipelines
📁 out -> Pickled output of models metrics on each target
📁 docs -> Documentation and additional experiment results
📁 experimental_results -> Additional experimental resuls on larger datasets
📁 statistical_evaluation -> Friedman and Nemenyi post-hoc test on all dataset, problems and models
📁 src -> Source code
📁 MBM -> Code regarding models, rule extraction and wrappers
Source Code Structure
``` 📁 src/HBM/models 📄 HB.py -> Hierarchical Model with Variational Bayes, EM and Gradient Descent, unified abstraction. 📄 MBNB.py -> Multivariate Bernouelli Naive Bayes Model 📄 rule_helpers.py -> Model wrappers for axiom extraction
📁 src/HBM 📄 ontology.py -> Ontology loading utility. Feature matrix contruction. Mapping from symbolic to vectorized formats
📁 src/ 📄 targetgenerator.py -> Artificial disjuctinve problem generator 📄 trainevaluate.py -> Train and evaluation script (all models and all targets of a selected dataset) 📄 axiomcompare.py -> Utility to compare target and extracted axioms 📄 utils.py -> Logging and printing utils 📄 statisticaltest.py -> Execute Friedman-Nemenyi test on cached results files 📄 utils.py -> Logging and printing utils ```
Project Details
Requirements
The project has been developed using Python 3.12.8, a pip freeze of the packages used are available in the requirements.txt file.
How to run experiments?
The target_evaluate.py Python code can be run with arguments, specifyng the dataset to be used, in order to ensure correct execution, execute the file in the src directory:
⚠ The dataset loading utility uses Owlready2 to automatically load the .owl files and run the Pellet reasoneer. The Owlready2 implementation requires specyfing the system JAVAEXE path. This is handled automatically for Mac and Linux users, the system used during the experimentations (if using the standard location for the JAVA executibles) for Windows user, we suggest to check if some modification of `src/MBN/ontology.py _setjava()` is necessary
bash
cd /path_to_project/sphm4kg/src
python3 train_evaluate.py --onto ("lubm", "financial", "ntnames", "krkrzeroone") # Choose one
Target Concept Ontologies
The onto_name.nt in data/onto_target refer to the artificially created disjucntive targets created for each ontology in oder to assess the probabilistic model prediction capabilities. These files contain two types of targets:
- SimpleClassX: Target defined as the disjuction of simple concepts
- HardClassX: Target defined as the disjuction of conjunctions of simple concepts
To load them without using the provided utility, this code can be used (lubm in this example):
```python from owlready2 import *
targetonto = getontology("./data/onto_target/lubm-t.nt").load()
simpletargets = set(targetonto.search(iri='#Simple_Class')) hardtargets = set(targetonto.search(iri='#Hard_Class')) ```
Feature Names
When working with the provided utility, you can see classes defined as:
namespace.ClassNaname: These refer to a class defined in the ontologynamespace.Some_relationName_range: These refer to existential restrictions on the relation range, formally defined as:
$$ SomerealationNamerange \equiv \exists relationName.Range(relationName) $$
Hyperparameters Settings
| Model | Parameters and Ranges |
| - | - |
| $\texttt{MBNB}$ | {}|
| $\texttt{HB}{\texttt{VB}}$ | `{ncomponents = 5, ninit = 10, niter = 200}|
| $\texttt{HB}_{\texttt{EM}}$ |{ncomponents = 5, maxiter = 200, tol=1e-3}|
| $\texttt{HB}_{\texttt{GD}}$ |{ncomponents = 5, maxiter = 200, tol=1e-3, learningrate=1e-4}|
| $\texttt{Tree}$ |{maxdepth=3, minsamplesleaf=10, criterion="logloss"}|
| $\texttt{LogReg}$ |{C=0.01, penalty='l1', solver='saga', maxiter=200}|
| $\texttt{HLogReg}$ |{ncomponents = 5, ninit = 10, niter = 200, C=0.01, penalty='l1', solver='saga', maxiter=200}|
| $\texttt{AxiomWrapper}$ |{thetau : (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9), thetap : (0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9)}`|
Larger Ontologies Preprocessing
Larger ontologies required ad-hoc preprocessing to onder to generate a unique owl file adhering to Owlready2 internal structure and limitation (for example, DBpedia containes individuals that are also classes, this breaks the Owlready2 interal class inheritance scheme). For these reasons in the data/raw folder both raw ontolgies file and preprocessing pipelines are provided:
- DBpedia50K: Refined version of the triples only version of this benchmark dataset, the preprocessing file creates a unique OWL file with entities, classes, object proprieties, domain and range axioms, class membership axioms and subclass axioms.
- YAGO4-20: Please download the
yago-wd-facts.ntand put it in thedata/raw/yago4-20to properly execute all preprocessing steps. Executeindividuals_uri.pyfirst and then the preprocessing pipelines to generate the yago OWL file.
All the already preprocessed dataset and artificial problems are already available in the project repository.
Owner
- Name: Ivan Diliso
- Login: ivandiliso
- Kind: user
- Location: Italy, Bari
- Company: University of Bari Aldo Moro
- Website: ivandiliso.github.io
- Repositories: 2
- Profile: https://github.com/ivandiliso
PhD Student @ University of Bari Aldo Moro
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Diliso" given-names: "Ivan" orcid: "https://orcid.org/0009-0007-2942-202X" - family-names: "Fanizzi" given-names: "Nicola" orcid: "https://orcid.org/0000-0001-5319-7933" - family-names: "d'Amato" given-names: "Claudia" orcid: "https://orcid.org/0000-0002-3385-987X" title: "Learning Interpretable Probabilistic Models and Schema Axioms for Knowledge Graphs" version: 0.0.0.1 doi: https://doi.org/10.5281/zenodo.15708405 date-released: url: https://github.com/ivandiliso/sphm4kg
GitHub Events
Total
- Release event: 2
- Push event: 12
- Public event: 1
- Create event: 2
Last Year
- Release event: 2
- Push event: 12
- Public event: 1
- Create event: 2
Dependencies
- Deprecated ==1.2.18
- Flask ==3.1.0
- GitPython ==3.1.44
- Jinja2 ==3.1.5
- Mako ==1.3.8
- Markdown ==3.7
- MarkupSafe ==3.0.2
- PyYAML ==6.0.2
- Pygments ==2.19.1
- SQLAlchemy ==2.0.37
- Werkzeug ==3.1.3
- aiohappyeyeballs ==2.4.4
- aiohttp ==3.11.11
- aiosignal ==1.3.2
- alembic ==1.14.1
- annotated-types ==0.7.0
- appnope ==0.1.4
- asttokens ==3.0.0
- attrs ==25.1.0
- blinker ==1.9.0
- cachetools ==5.5.2
- certifi ==2025.1.31
- charset-normalizer ==3.4.1
- class_resolver ==0.5.4
- click ==8.1.8
- click-default-group ==1.2.4
- cloudpickle ==3.1.1
- colorlog ==6.9.0
- comm ==0.2.2
- contourpy ==1.3.1
- cycler ==0.12.1
- databricks-sdk ==0.44.1
- dataclasses-json ==0.6.7
- debugpy ==1.8.12
- decorator ==5.1.1
- docdata ==0.0.4
- docker ==7.1.0
- executing ==2.2.0
- filelock ==3.17.0
- fonttools ==4.55.8
- frozenlist ==1.5.0
- fsspec ==2025.2.0
- gitdb ==4.0.12
- google-auth ==2.38.0
- graphene ==3.4.3
- graphql-core ==3.2.6
- graphql-relay ==3.2.0
- gunicorn ==23.0.0
- idna ==3.10
- imbalanced-learn ==0.13.0
- imblearn ==0.0
- importlib_metadata ==8.5.0
- ipykernel ==6.29.5
- ipython ==8.32.0
- ipywidgets ==8.1.5
- itsdangerous ==2.2.0
- jedi ==0.19.2
- joblib ==1.4.2
- jupyter_client ==8.6.3
- jupyter_core ==5.7.2
- jupyterlab_widgets ==3.0.13
- kiwisolver ==1.4.8
- marshmallow ==3.26.1
- matplotlib ==3.10.0
- matplotlib-inline ==0.1.7
- mlflow ==2.20.3
- mlflow-skinny ==2.20.3
- more-click ==0.1.2
- more-itertools ==10.6.0
- mpmath ==1.3.0
- multidict ==6.1.0
- mypy-extensions ==1.0.0
- nest-asyncio ==1.6.0
- networkx ==3.4.2
- numpy ==2.2.2
- opentelemetry-api ==1.30.0
- opentelemetry-sdk ==1.30.0
- opentelemetry-semantic-conventions ==0.51b0
- optuna ==4.2.0
- owlready2 ==0.47
- packaging ==24.2
- pandas ==2.2.3
- parso ==0.8.4
- patsy ==1.0.1
- pexpect ==4.9.0
- pillow ==11.1.0
- platformdirs ==4.3.6
- prompt_toolkit ==3.0.50
- propcache ==0.2.1
- protobuf ==5.29.3
- psutil ==6.1.1
- ptyprocess ==0.7.0
- pure_eval ==0.2.3
- pyarrow ==19.0.1
- pyasn1 ==0.6.1
- pyasn1_modules ==0.4.1
- pydantic ==2.10.6
- pydantic_core ==2.27.2
- pykeen ==1.11.0
- pyparsing ==3.2.1
- pystow ==0.7.0
- python-dateutil ==2.9.0.post0
- pytz ==2025.1
- pyzmq ==26.2.1
- rdflib ==7.1.3
- requests ==2.32.3
- rsa ==4.9
- scikit-learn ==1.6.1
- scikit-posthocs ==0.11.4
- scipy ==1.15.1
- seaborn ==0.13.2
- setuptools ==75.8.0
- six ==1.17.0
- sklearn-compat ==0.1.3
- smmap ==5.0.2
- sqlparse ==0.5.3
- stack-data ==0.6.3
- statsmodels ==0.14.4
- sympy ==1.13.1
- tabulate ==0.9.0
- threadpoolctl ==3.5.0
- torch ==2.5.1
- torch-geometric ==2.6.1
- torch-max-mem ==0.1.3
- torch-ppr ==0.0.8
- torchaudio ==2.5.1
- torchvision ==0.20.1
- tornado ==6.4.2
- tqdm ==4.67.1
- traitlets ==5.14.3
- typing-inspect ==0.9.0
- typing_extensions ==4.12.2
- tzdata ==2025.1
- urllib3 ==2.3.0
- wcwidth ==0.2.13
- widgetsnbextension ==4.0.13
- wittgenstein ==0.3.4
- wrapt ==1.17.2
- yarl ==1.18.3
- zipp ==3.21.0