dothash
Estimating Set Similarity Metrics for Link Prediction and Document Deduplication
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.6%) to scientific vocabulary
Repository
Estimating Set Similarity Metrics for Link Prediction and Document Deduplication
Basic Info
- Host: GitHub
- Owner: mikeheddes
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Homepage: https://arxiv.org/abs/2305.17310
- Size: 1.88 MB
Statistics
- Stars: 8
- Watchers: 1
- Forks: 1
- Open Issues: 2
- Releases: 0
Metadata Files
README.md
DotHash: Estimating Set Similarity Metrics for Link Prediction and Document Deduplication
This repository contains the source code for the research paper published at Knowledge Discovery and Data Mining Conference (KDD) 2023.
Requirements
The code is written in Python 3.10. The required packages to run the experiments can be found in requirements.txt. To install the required packages, run the following command:
bash
pip install -r requirements.txt
Experiments
The experiments are divided into two parts: (1) link prediction and (2) document deduplication. The experiments can be run using the following commands:
Link Prediction
bash
python link_prediction.py --help
Document Deduplication
Experiments with the core dataset require data to be downloaded from Google Drive and placed in the data directory.
bash
python document_deduplication.py --help
Citation
If you use this code for your research, please cite our paper:
@inproceedings{dothash,
title={DotHash: Estimating Set Similarity Metrics for Link Prediction and Document Deduplication},
author={Nunes, Igor and Heddes, Mike and Vergés, Pere and Abraham, Danny and Veidenbaum, Alex and Nicolau, Alexandru and Givargis, Tony},
booktitle={Proceedings of the 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
year={2023}
}
Owner
- Name: Mike Heddes
- Login: mikeheddes
- Kind: user
- Location: Irvine, California
- Website: https://www.mikeheddes.nl
- Repositories: 18
- Profile: https://github.com/mikeheddes
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this code for your research, please cite our paper."
authors:
- family-names: "Nunes"
given-names: "Igor"
- family-names: "Heddes"
given-names: "Mike"
orcid: "https://orcid.org/0000-0002-9276-458X"
- family-names: "Vergés"
given-names: "Pere"
- family-names: "Abraham"
given-names: "Danny"
- family-names: "Veidenbaum"
given-names: "Alex"
- family-names: "Nicolau"
given-names: "Alexandru"
- family-names: "Givargis"
given-names: "Tony"
title: "DotHash: Estimating Set Similarity Metrics for Link Prediction and Document Deduplication"
url: "https://github.com/mikeheddes/dothash"
preferred-citation:
type: conference-paper
authors:
- family-names: "Nunes"
given-names: "Igor"
- family-names: "Heddes"
given-names: "Mike"
orcid: "https://orcid.org/0000-0002-9276-458X"
- family-names: "Vergés"
given-names: "Pere"
- family-names: "Abraham"
given-names: "Danny"
- family-names: "Veidenbaum"
given-names: "Alex"
- family-names: "Nicolau"
given-names: "Alexandru"
- family-names: "Givargis"
given-names: "Tony"
title: "DotHash: Estimating Set Similarity Metrics for Link Prediction and Document Deduplication"
collection-title: "Proceedings of the 29th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining"
collection-type: proceedings
year: 2023
GitHub Events
Total
- Watch event: 2
- Fork event: 1
Last Year
- Watch event: 2
- Fork event: 1
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 2
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 0
- Average comments per issue: 3.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jianshu93 (1)
- ksrinivs64 (1)