lifelong-learning
Lifelong Learning of Graph Neural Networks for Open-World Node Classification
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: sciencedirect.com, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary
Keywords
Repository
Lifelong Learning of Graph Neural Networks for Open-World Node Classification
Statistics
- Stars: 29
- Watchers: 2
- Forks: 4
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Lifelong Learning of Graph Neural Networks for Open-World Node Classification
Papers
Lukas Galke, Iacopo Vagliano, Benedikt Franke, Tobias Zielke, Marcel Hoffmann, Ansgar Scherp (2023). Lifelong Learning on Evolving Graphs Under the Constraints of Imbalanced Classes and New Classes. Neural Networks 164, 156-176.
Lukas Galke, Benedikt Franke, Tobias Zielke, Ansgar Scherp (2021). Lifelong Learning of Graph Neural Networks for Open-World Node Classification, In 2021 International Joint Conference on Neural Networks (IJCNN), IEEE.
``` @article{galke2023lifelong, title = {Lifelong learning on evolving graphs under the constraints of imbalanced classes and new classes}, journal = {Neural Networks}, volume = {164}, pages = {156-176}, year = {2023}, issn = {0893-6080}, doi = {https://doi.org/10.1016/j.neunet.2023.04.022}, url = {https://www.sciencedirect.com/science/article/pii/S0893608023002083}, author = {Lukas Galke and Iacopo Vagliano and Benedikt Franke and Tobias Zielke and Marcel Hoffmann and Ansgar Scherp} }
@inproceedings{galke2021lifelong, author={Galke, Lukas and Franke, Benedikt and Zielke, Tobias and Scherp, Ansgar}, booktitle={2021 International Joint Conference on Neural Networks (IJCNN)}, title={Lifelong Learning of Graph Neural Networks for Open-World Node Classification}, year={2021}, volume={}, number={}, pages={1-8}, doi={10.1109/IJCNN52387.2021.9533412} } ```
Installation
- Setup a python virtual environment (recommended)
- Install pytorch as suited to your OS / python package manager
- Install dgl as suited to your OS / python package manager / CUDA version
- Install torch-geometric
- Install other requirements via
pip install -r requirements.txtwithin your copy of this repository. This will include mainlynumpy,scipy,pandas,matplotlib,seaborn,scikit-learn.
In the requirements.txt file we list the versions we have used for our experiments. Newer versions of standard packages will likely work fine.
Get the datasets
The three datasets of our paper are available on zenodo.
Download the zip files and extract them into the data subdirectory, such that the structure looks exactly like this:
data/dblp-easydata/dblp-harddata/pharmabio
Example call to run an experiment
The following exemplary command will run an experiment with a GraphSAGE model with two graph convolutoinal layers and a hidden dimension of 32 on the dblp-easy dataset starting evaluation at task corresponding to year 2004 while using 200 annual epochs.
python3 run_experiment.py --seed 42 --backend dgl --model gs-mean --n_hidden 32 --start warm --lr "0.005" --history 3 --n_layers 1 --weight_decay 0 --dropout 0.5 --initial_epochs 0 --annual_epochs 200 --dataset "dblp-easy" --t_start 2004 --save "results.csv"
The results.csv file can be reused for multiple runs (e.g. with different seeds, different models, different datasets), the script appends new results to the file.
Consult python3 run_experiment.py -h for more information. The other variant of our run script run_experiment_new.py is similar, but operates on preprocessed tasks (see below).
Tabularize results
To bring the results into a human readable form, we provide a tabularize.py script.
This script takes one or more results files as input and groups them depending on the -g parameter, which can be multiple attributes.
Example: Aggregate results by dataset, history size, model, restart configuration, and learning rate.
python3 tabularize.py results.csv -g dataset history model start initial_lr
The script tabularize_forward_transfer.py uses the exact grouping as we use for the results of the paper and also computes Forward Transfer (averaged difference between warm and cold restarts).
Note that each configuration must have a value for both start=warm and start=cold, such that Forward Transfer can be computed.
Visualize results
You can visualize with the visualize.py script:
python3 visualize.py --style "window size %RF" --hue model --col dataset --row start --save plot.png results.csv
where results.csv is the file where you've aggregated the results. You can also provide multiple results files, then they will be concatenated before plotting.
Preprocessing tasks for multiple experiments
Constructing tasks dynamically may take some time during experiments.
The tasks can be preprocessed with the script preprocess_datasets.py
Important: the Argument t_zero must be one task before the first evaluation task, i.e., if you want to start evaluation at time 2004, you must preprocess tasks starting at time 2003.
Example to preprocess tasks for dblp-easy starting at year 2003 with a history size of 3 for models implemented via dgl:
python3 preprocess_dataset.py data/dblp-easy` --t_zero 2003 --history 3 --backend dgl data/dblp-easy
In scripts/preprocess_all_datasets.bash, you find a shorthand to preprocess all datasets in all history size configurations from the paper for both backends.
This shorthand should be started with the repository's root as working directory and expects the directory structure of data/ as described above.
Then, you need to use the version of the run script that uses preprocessed tasks, namely run_experiment_new.py.
The interface of the script is nearly the same as the one of run_experiment.py.
One difference is that reproducing our ablation study for comparison with once-trained static models is only possible with run_experiment.py.
Full reproduction of the paper's experiments
In the experiments/ directory, you find bash scripts to re-run all our experiments.
Issues?
Note: Such that the python/bash scripts within this repository work properly, you must call them from the root directory of this repository.
If there are any problems with using this repository, feel free to file an issue.
File Descriptions
| File | Description | | - | - | | analysis | scripts to perform analyses | | datasets.py | dataset loading | | lifelonglearning.py | lifelong learning module | | driftmagnitude.py | drift magnitude module | | preprocessdataset.py | Create a Lifelong Learning Graph Dataset by preprocessing tasks | | experiments | Bash scripts to reproduce experiments | | scripts | Other bash scripts to preprocess/compute tdiff distribution | | models | GNN implementations | | README.md | this file | | requirements.txt | dependencies | | runexperiment.py | main entry point for running a single experiment | | runexperimentnew.py | main entry point for running a single experiment with preprocessed data | | tabularize.py | aggregate results into table | | tabularizeforwardtransfer.py | aggregate results into table including Forward Transfer computation | | visualize.py | visualize results |
Notes
- The experiments for inductive vs transductive learning can be found in a different repository.
Owner
- Name: Lukas Galke
- Login: lgalke
- Kind: user
- Location: Nijmegen, NL
- Company: Max Planck Institute for Psycholinguistics
- Website: http://lpag.de
- Twitter: LukasGalke
- Repositories: 7
- Profile: https://github.com/lgalke
Early Postdoc at MPI Psycholinguistics exploring emergent communication with deep nets
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite the accompanying journal article."
preferred-citation:
type: article
authors:
- family-names: "Galke"
given-names: "Lukas"
orcid: "https://orcid.org/0000-0001-6124-1092"
- family-names: "Vagliano"
given-names: "Iacopo"
- family-names: "Franke"
given-names: "Benedikt"
- family-names: "Zielke"
given-names: "Tobias"
- family-names: "Hoffmann"
given-names: "Marcel"
- family-names: "Scherp"
given-names: "Ansgar"
title: "Lifelong learning on evolving graphs under the constraints of imbalanced classes and new classes"
doi: 10.1016/j.neunet.2023.04.022
year: 2023
journal: "Neural Networks"
volume: 164
start: 156
end: 176
GitHub Events
Total
- Watch event: 3
Last Year
- Watch event: 3
Dependencies
- dgl-cu110 ==0.5.3
- joblib *
- matplotlib ==3.3.4
- networkx *
- numpy *
- pandas ==1.2.1
- scikit-learn *
- scipy ==1.6.0
- seaborn ==0.11.0
- torch ==1.7.1
- torch-cluster ==1.5.8
- torch-geometric ==1.6.3
- torch-scatter ==2.0.5
- torch-sparse ==0.6.8
- torch-spline-conv ==1.2.0
- tqdm *