gnnrank
Official code for the ICML2022 paper -- GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed Graph Neural Networks
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.8%) to scientific vocabulary
Keywords
Repository
Official code for the ICML2022 paper -- GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed Graph Neural Networks
Basic Info
Statistics
- Stars: 53
- Watchers: 1
- Forks: 10
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
GNNRank
This is the official code repo for the ICML 2022 paper -- GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed Graph Neural Networks. A recorded video for the live talk at ICML 2022 is provided via this link. You are also welcome to read our poster.
Citing
If you find our repo or paper useful in your research, please consider adding the following citation:
bibtex
@inproceedings{he2022gnnrank,
title={GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed Graph Neural Networks},
author={He, Yixuan and Gan, Quan and Wipf, David and Reinert, Gesine D and Yan, Junchi and Cucuringu, Mihai},
booktitle={International Conference on Machine Learning},
pages={8581--8612},
year={2022},
organization={PMLR}
}
Environment Setup
Overview
The project has been tested on the following environment specification: 1. Ubuntu 18.04.6 LTS (Other x86_64 based Linux distributions should also be fine, such as Fedora 32) 2. Nvidia Graphic Card (NVIDIA Tesla T4 with driver version 450.142.00) and CPU (Intel Core i7-10700 CPU @ 2.90GHz) 3. Python 3.7 (and Python 3.6.12) 4. CUDA 11.0 (and CUDA 9.2) 5. Pytorch 1.10.1 (built against CUDA 11.0) and Pytorch 1.8.0 (build against CUDA 10.2) 6. Other libraries and python packages (See below)
Installation method 1 (.yml files)
You should handle (1),(2) yourself. For (3), (4), (5) and (6), we provide a list of steps to install them.
We provide two examples of envionmental setup, one with CUDA 11.0 and GPU, the other with CPU.
Following steps assume you've done with (1) and (2). 1. Install conda. Both Miniconda and Anaconda are OK.
Create an environment and install python packages (GPU):
conda env create -f environment_GPU.ymlCreate an environment and install python packages (CPU):
conda env create -f environment_CPU.yml
Installation method 2 (manual installation)
The codebase is implemented in Python 3.6.12. package versions used for development are below.
networkx 2.6.3
tqdm 4.62.3
numpy 1.20.3
pandas 1.3.4
texttable 1.6.4
latextable 0.2.1
scipy 1.7.1
argparse 1.1.0
scikit-learn 1.0.1
stellargraph 1.2.1 (for link direction prediction: conda install -c stellargraph stellargraph)
torch 1.10.1
torch-scatter 2.0.9
pyg 2.0.3 (follow https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html)
sparse 0.13.0
Execution checks
When installation is done, you could check you enviroment via:
cd execution
bash setup_test.sh
Folder structure
./execution/ stores files that can be executed to generate outputs. For vast number of experiments, we use GNU parallel, which can be downloaded in command line and make it executable via:
wget http://git.savannah.gnu.org/cgit/parallel.git/plain/src/parallel chmod 755 ./parallel./joblog/ stores job logs from parallel. You might need to create it by
mkdir joblog./Output/ stores raw outputs (ignored by Git) from parallel. You might need to create it by
mkdir Output./data/ stores processed data sets.
./src/ stores files to train various models, utils and metrics.
./result_arrays/ stores results for different data sets. Each data set has a separate subfolder.
./result_anlysis/ stores notebooks for generating result plots or tables.
./logs/ stores trained models and logs, as well as predicted clusters (optional). When you are in debug mode (see below), your logs will be stored in ./debug_logs/ folder.
Options
GNNRank provides various command line arguments, which can be viewed in the ./src/param_parser.py. Some examples are:
--epochs INT Number of GNNRank (maximum) training epochs. Default is 1000.
--early_stopping INT Number of GNNRank early stopping epochs. Default is 200.
--num_trials INT Number of trials to generate results. Default is 10.
--lr FLOAT Initial learning rate. Default is 0.01.
--weight_decay FLOAT Weight decay (L2 loss on parameters). Default is 5^-4.
--dropout FLOAT Dropout rate (1 - keep probability). Default is 0.5.
--hidden INT Number of embedding dimension divided by 2. Default is 32.
--seed INT Random seed. Default is 31.
--no-cuda BOOL Disables CUDA training. Default is False.
--debug, -D BOOL Debug with minimal training setting, not to get results. Default is False.
-AllTrain, -All BOOL Whether to use all data to do gradient descent. Default is False.
--SavePred, -SP BOOL Whether to save predicted results. Default is False.
--dataset STR Data set to consider. Default is 'ERO/'.
--all_methods LST Methods to use to generate results. Default is ['btl','DIGRAC'].
Reproduce results
First, get into the ./execution/ folder:
cd execution
To reproduce basketball results executed on CUDA 1.
bash basketball1.sh
To reproduce results on synthetic data.
bash 0ERO.sh
Other execution files are similar to run.
Note that if you are operating on CPU, you may delete the commands ``CUDAVISIBLEDEVICES=xx". You can also set you own number of parallel jobs, not necessarily following the j numbers in the .sh files.
You can also use CPU for training if you add ``--no-duca", or GPU if you delete this.
Direct execution with training files
First, get into the ./src/ folder:
cd src
Then, below are various options to try:
Creating a GNNRank model for animal data using DIGRAC as GNN, also produce results on syncRank.
python ./train.py --all_methods DIGRAC syncRank --dataset animal
Creating a GNNRank model for ERO data using both DIGRAC and ib as GNN with 350 nodes, using 0.05 as learning rate.
python ./train.py --N 350 --all_methods DIGRAC ib --lr 0.05
Creating a GNNRank model for basketball data in season 2010 using all baselines excluding mvr, also save predicted results.
python ./train.py --dataset basketball --season 2010 -SP --all_methods baselines_shorter
Creating a model for HeadToHead data set with specific number of trials, hidden units and use CPU.
```
python ./train.py --dataset HeadToHead --no-cuda --num_trials 5 --hidden 8
```
Notes
- For certain applications such as financial data sets, the original adjacency matrices might be skew-symmetric with negative edge weights. For our models here, however, we need to preprocess the data so that we only keep the positive edge weights, as our current pipeline, including the loss functions, are restricted to directed unsigned networks as inputs.
Owner
- Name: Yixuan He
- Login: SherylHYX
- Kind: user
- Location: Oxford
- Company: University of Oxford
- Website: https://sherylhyx.github.io
- Twitter: sherylhyx
- Repositories: 4
- Profile: https://github.com/SherylHYX
DPhil in Statistics @ University of Oxford
GitHub Events
Total
- Watch event: 5
- Fork event: 2
Last Year
- Watch event: 5
- Fork event: 2
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0