Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: prathyushpoduval
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Size: 60.5 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

DotHash: Estimating Set Similarity Metrics for Link Prediction and Document Deduplication

This repository contains the source code for the research paper published at Knowledge Discovery and Data Mining Conference (KDD) 2023.

Requirements

The code is written in Python 3.10. The required packages to run the experiments can be found in requirements.txt. To install the required packages, run the following command:

bash pip install -r requirements.txt

Experiments

The experiments are divided into two parts: (1) link prediction and (2) document deduplication. The experiments can be run using the following commands:

python link_prediction.py --dataset ogbl-ddi ogbl-ppa ogbl-collab soc-epinions1 soc-livejournal1 soc-pokec soc-slashdot0811 soc-slashdot0922 facebook wikipedia --method jaccard adamic-adar common-neighbors resource-allocation hyperhash-jaccard dothash-jaccard hyperhash-adamic-adar dothash-adamic-adar hyperhash-common-neighbors dothash-common-neighbors hyperhash-resource-allocation dothash-resource-allocation minhash simhash --lr 0.1 --dimensions 1000

python learnedlinkprediction.py --dataset ogbl-ddi ogbl-ppa ogbl-collab soc-epinions1 soc-livejournal1 soc-pokec soc-slashdot0811 soc-slashdot0922 facebook wikipedia --method hyperlearn --lr 0.1 --dimensions 1000 --nitr 50 --usenodefeatures 0 1

python learnedlinkprediction.py --dataset facebook wikipedia --method hyperlearn --lr 0.1 --dimensions 1000 --usenodefeatures 0 1

python learnedlinkprediction.py --dataset ogbl-collab --method hyperlearn --dimension 1000 --lr 0.1 --nitr 10

Link Prediction

bash python link_prediction.py --help

Document Deduplication

Experiments with the core dataset require data to be downloaded from Google Drive and placed in the data directory.

bash python document_deduplication.py --help

Citation

If you use this code for your research, please cite our paper:

@inproceedings{dothash, title={DotHash: Estimating Set Similarity Metrics for Link Prediction and Document Deduplication}, author={Nunes, Igor and Heddes, Mike and Vergs, Pere and Abraham, Danny and Veidenbaum, Alex and Nicolau, Alexandru and Givargis, Tony}, booktitle={Proceedings of the 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining}, year={2023} }

Owner

  • Name: PhaserMan
  • Login: prathyushpoduval
  • Kind: user

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • ogb ==1.3.6
  • pandas ==2.0.1
  • scipy >=1.8.0
  • torch ==1.12
  • torch-geometric ==2.3.1
  • torch-hd ==5.1.2
  • tqdm ==4.65.0
  • typed-argument-parser ==1.8.0