refactor-negative-sampler
Repository for paper "Enhancing PyKeen with Multiple Negative Sampling Solutions for Knowledge Graph Embedding Models"
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary
Repository
Repository for paper "Enhancing PyKeen with Multiple Negative Sampling Solutions for Knowledge Graph Embedding Models"
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Enhancing PyKeen with Multiple Negative Sampling Solutions for Knowledge Graph Embedding Models
Documentation
In depth documentation and tutorials are available in the apposite GitHub Page https://ivandiliso.github.io/refactor-negative-sampler/
Folder Structure
data -> Dataset used during traning, validation and testing
YAGO4-20
FB15K
WN18
DB50K
doc -> Documentations and logs
cached -> Cached Negative Sampler subsets for faster computation
model
embedding -> Embedding models checkpoints
sampling -> Checkpoints for models used in dynamic sampling
experiments -> Experiments results after HPO pipeline
script -> Single execution files, settings etc
src -> Source code
extension -> Extensions of PyKeen classes for negative sampling
utils -> Utility files, libraries, logging
notebooks -> Testing, single exectuion and code evaluation notebooks
temp -> Temporary files
Dataset Stucture
Each dataset is provided with the following folder structure
dataset_name
mapping
entity_to_id.json -> Dictionary mapping entity names (string) to IDs (integer)
relation_to_id.json -> Dictionary mapping relation names (string) to IDs (integer)
metadata
entity_classes.json -> Dictionary mapping entity names (string) to classes (list of strings)
relation_domain_range.json -> Dictionary mapping relation names (string) to domain and range classes (string)
owl -> Additional schema-level information in OWL format
train.txt -> Training Split Triples in TSV format (using string names)
test.txt -> Testing Split Triples in TSV format (using string names)
valid.txt -> Validation Split Triples in TSV format (using string names)
Extension Structure
``` src/extension constants.py -> Constant variables used across the whole library dataset.py -> Implementation of OnMemoryDataset filtering.py -> Implementation of NullPytonSetFilterer sampling.py -> Implementation of SubsetNegativeSampler and all the specific sampling strategies utils.py -> Utility functions
```
Instructions
A fully detailed tutorial is provided in src/tutorial.ipynb.Detailed instruction are available in https://ivandiliso.github.io/refactor-negative-sampler/
- Unzip the datasets files
- Install the dependencies found in the requirements.txt file
- Manually run the example python files, or use one of the provided scripts in the scripts folder
The library is completely integrated in the PyKEEN ecosystem, if you need a boostrap on using the library on the fly, just follow this guide, three example file can be used to run in order a hpo pipeline, a normal pipeline, and the negative sampler evaluation. If you want to directly run an example configuration, you can find
hpo_pipeline.py
Run a hyperparameter optimization pipeline using the chosen model, can be run using CLI arguments:
bash
python src/hpo_pipeline.py
--dataset dataset_name
--model model_name
--sampler sampler_name
--negatives number_negatives
pipeline.py
Run a pipeline using the chosen model and static defined parameters, can be run using CLI arguments:
bash
python src/hpo_pipeline.py
--dataset dataset_name
--model model_name
--sampler sampler_name
--negatives number_negatives
--l2 regularizer_weight
--lr learning_rate
--margin loss_margin
negative_evaluation.py
Example code on how to compute the negative sampler statistic for a specific dataset. This file also contains use examples of Dynamic Sampling using a TransE pretained model on YAGO4-20, it provides pre-written prediciton function that work with the provided model.
Cite our paper
bibtex
@misc{damato2025enhancingpykeenmultiplenegative,
title={Enhancing PyKEEN with Multiple Negative Sampling Solutions for Knowledge Graph Embedding Models},
author={Claudia d'Amato and Ivan Diliso and Nicola Fanizzi and Zafar Saeed},
year={2025},
eprint={2508.05587},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2508.05587},
}
Owner
- Name: Ivan Diliso
- Login: ivandiliso
- Kind: user
- Location: Italy, Bari
- Company: University of Bari Aldo Moro
- Website: ivandiliso.github.io
- Repositories: 2
- Profile: https://github.com/ivandiliso
PhD Student @ University of Bari Aldo Moro
GitHub Events
Total
- Release event: 4
- Push event: 14
- Create event: 5
Last Year
- Release event: 4
- Push event: 14
- Create event: 5
Dependencies
- Jinja2 ==3.1.6
- Mako ==1.3.10
- MarkupSafe ==3.0.2
- PySocks ==1.7.1
- PyYAML ==6.0.2
- Pygments ==2.19.1
- SQLAlchemy ==2.0.40
- alembic ==1.15.2
- asttokens ==3.0.0
- beautifulsoup4 ==4.13.4
- certifi ==2025.4.26
- charset-normalizer ==3.4.1
- class-resolver ==0.6.0
- click ==8.1.8
- click-default-group ==1.2.4
- colorlog ==6.9.0
- comm ==0.2.2
- dataclasses-json ==0.6.7
- debugpy ==1.8.14
- decorator ==5.2.1
- docdata ==0.0.5
- executing ==2.2.0
- filelock ==3.18.0
- fsspec ==2025.3.2
- gdown ==5.2.0
- greenlet ==3.2.1
- idna ==3.10
- ipykernel ==6.29.5
- ipython ==9.2.0
- ipython_pygments_lexers ==1.1.1
- jedi ==0.19.2
- joblib ==1.4.2
- jupyter_client ==8.6.3
- jupyter_core ==5.7.2
- marshmallow ==3.26.1
- matplotlib-inline ==0.1.7
- more-click ==0.1.2
- more-itertools ==10.7.0
- mpmath ==1.3.0
- mypy_extensions ==1.1.0
- nest-asyncio ==1.6.0
- networkx ==3.4.2
- numpy ==2.2.5
- nvidia-cublas-cu12 ==12.6.4.1
- nvidia-cuda-cupti-cu12 ==12.6.80
- nvidia-cuda-nvrtc-cu12 ==12.6.77
- nvidia-cuda-runtime-cu12 ==12.6.77
- nvidia-cudnn-cu12 ==9.5.1.17
- nvidia-cufft-cu12 ==11.3.0.4
- nvidia-cufile-cu12 ==1.11.1.6
- nvidia-curand-cu12 ==10.3.7.77
- nvidia-cusolver-cu12 ==11.7.1.2
- nvidia-cusparse-cu12 ==12.5.4.2
- nvidia-cusparselt-cu12 ==0.6.3
- nvidia-nccl-cu12 ==2.26.2
- nvidia-nvjitlink-cu12 ==12.6.85
- nvidia-nvtx-cu12 ==12.6.77
- optuna ==4.3.0
- packaging ==25.0
- pandas ==2.2.3
- parso ==0.8.4
- pexpect ==4.9.0
- platformdirs ==4.3.8
- prompt_toolkit ==3.0.51
- psutil ==7.0.0
- ptyprocess ==0.7.0
- pure_eval ==0.2.3
- pykeen ==1.11.1
- pystow ==0.7.0
- python-dateutil ==2.9.0.post0
- pytz ==2025.2
- pyzmq ==26.4.0
- requests ==2.32.3
- scikit-learn ==1.6.1
- scipy ==1.15.2
- setuptools ==80.0.1
- six ==1.17.0
- soupsieve ==2.7
- stack-data ==0.6.3
- sympy ==1.14.0
- tabulate ==0.9.0
- threadpoolctl ==3.6.0
- torch ==2.7.0
- torch-max-mem ==0.1.4
- torch-ppr ==0.0.8
- tornado ==6.4.2
- tqdm ==4.67.1
- traitlets ==5.14.3
- triton ==3.3.0
- typing-inspect ==0.9.0
- typing_extensions ==4.13.2
- tzdata ==2025.2
- urllib3 ==2.4.0
- wcwidth ==0.2.13