refactor-negative-sampler

Repository for paper "Enhancing PyKeen with Multiple Negative Sampling Solutions for Knowledge Graph Embedding Models"

https://github.com/ivandiliso/refactor-negative-sampler

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Repository for paper "Enhancing PyKeen with Multiple Negative Sampling Solutions for Knowledge Graph Embedding Models"

Basic Info

Host: GitHub
Owner: ivandiliso
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 552 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Created about 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

Enhancing PyKeen with Multiple Negative Sampling Solutions for Knowledge Graph Embedding Models

Documentation

In depth documentation and tutorials are available in the apposite GitHub Page https://ivandiliso.github.io/refactor-negative-sampler/

Folder Structure

data -> Dataset used during traning, validation and testing YAGO4-20 FB15K WN18 DB50K doc -> Documentations and logs cached -> Cached Negative Sampler subsets for faster computation model embedding -> Embedding models checkpoints sampling -> Checkpoints for models used in dynamic sampling experiments -> Experiments results after HPO pipeline script -> Single execution files, settings etc src -> Source code extension -> Extensions of PyKeen classes for negative sampling utils -> Utility files, libraries, logging notebooks -> Testing, single exectuion and code evaluation notebooks temp -> Temporary files

Dataset Stucture

Each dataset is provided with the following folder structure

dataset_name mapping entity_to_id.json -> Dictionary mapping entity names (string) to IDs (integer) relation_to_id.json -> Dictionary mapping relation names (string) to IDs (integer) metadata entity_classes.json -> Dictionary mapping entity names (string) to classes (list of strings) relation_domain_range.json -> Dictionary mapping relation names (string) to domain and range classes (string) owl -> Additional schema-level information in OWL format train.txt -> Training Split Triples in TSV format (using string names) test.txt -> Testing Split Triples in TSV format (using string names) valid.txt -> Validation Split Triples in TSV format (using string names)

Extension Structure

``` src/extension constants.py -> Constant variables used across the whole library dataset.py -> Implementation of OnMemoryDataset filtering.py -> Implementation of NullPytonSetFilterer sampling.py -> Implementation of SubsetNegativeSampler and all the specific sampling strategies utils.py -> Utility functions

```

Instructions

A fully detailed tutorial is provided in src/tutorial.ipynb.Detailed instruction are available in https://ivandiliso.github.io/refactor-negative-sampler/

Unzip the datasets files
Install the dependencies found in the requirements.txt file
Manually run the example python files, or use one of the provided scripts in the scripts folder

The library is completely integrated in the PyKEEN ecosystem, if you need a boostrap on using the library on the fly, just follow this guide, three example file can be used to run in order a hpo pipeline, a normal pipeline, and the negative sampler evaluation. If you want to directly run an example configuration, you can find

hpo_pipeline.py

Run a hyperparameter optimization pipeline using the chosen model, can be run using CLI arguments:

bash python src/hpo_pipeline.py --dataset dataset_name --model model_name --sampler sampler_name --negatives number_negatives

pipeline.py

Run a pipeline using the chosen model and static defined parameters, can be run using CLI arguments:

bash python src/hpo_pipeline.py --dataset dataset_name --model model_name --sampler sampler_name --negatives number_negatives --l2 regularizer_weight --lr learning_rate --margin loss_margin

negative_evaluation.py

Example code on how to compute the negative sampler statistic for a specific dataset. This file also contains use examples of Dynamic Sampling using a TransE pretained model on YAGO4-20, it provides pre-written prediciton function that work with the provided model.

Cite our paper

bibtex @misc{damato2025enhancingpykeenmultiplenegative, title={Enhancing PyKEEN with Multiple Negative Sampling Solutions for Knowledge Graph Embedding Models}, author={Claudia d'Amato and Ivan Diliso and Nicola Fanizzi and Zafar Saeed}, year={2025}, eprint={2508.05587}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2508.05587}, }

Owner

Name: Ivan Diliso
Login: ivandiliso
Kind: user
Location: Italy, Bari
Company: University of Bari Aldo Moro

Website: ivandiliso.github.io
Repositories: 2
Profile: https://github.com/ivandiliso

PhD Student @ University of Bari Aldo Moro

GitHub Events

Total

Release event: 4
Push event: 14
Create event: 5

Last Year

Release event: 4
Push event: 14
Create event: 5

Dependencies

requirements.txt pypi

Jinja2 ==3.1.6
Mako ==1.3.10
MarkupSafe ==3.0.2
PySocks ==1.7.1
PyYAML ==6.0.2
Pygments ==2.19.1
SQLAlchemy ==2.0.40
alembic ==1.15.2
asttokens ==3.0.0
beautifulsoup4 ==4.13.4
certifi ==2025.4.26
charset-normalizer ==3.4.1
class-resolver ==0.6.0
click ==8.1.8
click-default-group ==1.2.4
colorlog ==6.9.0
comm ==0.2.2
dataclasses-json ==0.6.7
debugpy ==1.8.14
decorator ==5.2.1
docdata ==0.0.5
executing ==2.2.0
filelock ==3.18.0
fsspec ==2025.3.2
gdown ==5.2.0
greenlet ==3.2.1
idna ==3.10
ipykernel ==6.29.5
ipython ==9.2.0
ipython_pygments_lexers ==1.1.1
jedi ==0.19.2
joblib ==1.4.2
jupyter_client ==8.6.3
jupyter_core ==5.7.2
marshmallow ==3.26.1
matplotlib-inline ==0.1.7
more-click ==0.1.2
more-itertools ==10.7.0
mpmath ==1.3.0
mypy_extensions ==1.1.0
nest-asyncio ==1.6.0
networkx ==3.4.2
numpy ==2.2.5
nvidia-cublas-cu12 ==12.6.4.1
nvidia-cuda-cupti-cu12 ==12.6.80
nvidia-cuda-nvrtc-cu12 ==12.6.77
nvidia-cuda-runtime-cu12 ==12.6.77
nvidia-cudnn-cu12 ==9.5.1.17
nvidia-cufft-cu12 ==11.3.0.4
nvidia-cufile-cu12 ==1.11.1.6
nvidia-curand-cu12 ==10.3.7.77
nvidia-cusolver-cu12 ==11.7.1.2
nvidia-cusparse-cu12 ==12.5.4.2
nvidia-cusparselt-cu12 ==0.6.3
nvidia-nccl-cu12 ==2.26.2
nvidia-nvjitlink-cu12 ==12.6.85
nvidia-nvtx-cu12 ==12.6.77
optuna ==4.3.0
packaging ==25.0
pandas ==2.2.3
parso ==0.8.4
pexpect ==4.9.0
platformdirs ==4.3.8
prompt_toolkit ==3.0.51
psutil ==7.0.0
ptyprocess ==0.7.0
pure_eval ==0.2.3
pykeen ==1.11.1
pystow ==0.7.0
python-dateutil ==2.9.0.post0
pytz ==2025.2
pyzmq ==26.4.0
requests ==2.32.3
scikit-learn ==1.6.1
scipy ==1.15.2
setuptools ==80.0.1
six ==1.17.0
soupsieve ==2.7
stack-data ==0.6.3
sympy ==1.14.0
tabulate ==0.9.0
threadpoolctl ==3.6.0
torch ==2.7.0
torch-max-mem ==0.1.4
torch-ppr ==0.0.8
tornado ==6.4.2
tqdm ==4.67.1
traitlets ==5.14.3
triton ==3.3.0
typing-inspect ==0.9.0
typing_extensions ==4.13.2
tzdata ==2025.2
urllib3 ==2.4.0
wcwidth ==0.2.13

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science