Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.8%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: prathyushpoduval
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Size: 60.5 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
DotHash: Estimating Set Similarity Metrics for Link Prediction and Document Deduplication
This repository contains the source code for the research paper published at Knowledge Discovery and Data Mining Conference (KDD) 2023.
Requirements
The code is written in Python 3.10. The required packages to run the experiments can be found in requirements.txt. To install the required packages, run the following command:
bash
pip install -r requirements.txt
Experiments
The experiments are divided into two parts: (1) link prediction and (2) document deduplication. The experiments can be run using the following commands:
python link_prediction.py --dataset ogbl-ddi ogbl-ppa ogbl-collab soc-epinions1 soc-livejournal1 soc-pokec soc-slashdot0811 soc-slashdot0922 facebook wikipedia --method jaccard adamic-adar common-neighbors resource-allocation hyperhash-jaccard dothash-jaccard hyperhash-adamic-adar dothash-adamic-adar hyperhash-common-neighbors dothash-common-neighbors hyperhash-resource-allocation dothash-resource-allocation minhash simhash --lr 0.1 --dimensions 1000
python learnedlinkprediction.py --dataset ogbl-ddi ogbl-ppa ogbl-collab soc-epinions1 soc-livejournal1 soc-pokec soc-slashdot0811 soc-slashdot0922 facebook wikipedia --method hyperlearn --lr 0.1 --dimensions 1000 --nitr 50 --usenodefeatures 0 1
python learnedlinkprediction.py --dataset facebook wikipedia --method hyperlearn --lr 0.1 --dimensions 1000 --usenodefeatures 0 1
python learnedlinkprediction.py --dataset ogbl-collab --method hyperlearn --dimension 1000 --lr 0.1 --nitr 10
Link Prediction
bash
python link_prediction.py --help
Document Deduplication
Experiments with the core dataset require data to be downloaded from Google Drive and placed in the data directory.
bash
python document_deduplication.py --help
Citation
If you use this code for your research, please cite our paper:
@inproceedings{dothash,
title={DotHash: Estimating Set Similarity Metrics for Link Prediction and Document Deduplication},
author={Nunes, Igor and Heddes, Mike and Vergs, Pere and Abraham, Danny and Veidenbaum, Alex and Nicolau, Alexandru and Givargis, Tony},
booktitle={Proceedings of the 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
year={2023}
}
Owner
- Name: PhaserMan
- Login: prathyushpoduval
- Kind: user
- Repositories: 1
- Profile: https://github.com/prathyushpoduval
GitHub Events
Total
Last Year
Dependencies
- ogb ==1.3.6
- pandas ==2.0.1
- scipy >=1.8.0
- torch ==1.12
- torch-geometric ==2.3.1
- torch-hd ==5.1.2
- tqdm ==4.65.0
- typed-argument-parser ==1.8.0