GraphSL

GraphSL: An Open-Source Library for Graph Source Localization Approaches and Benchmark Datasets - Published in JOSS (2024)

https://github.com/xianggebenben/graphsl

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 8 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org, zenodo.org
✓
Committers with academic emails
1 of 5 committers (20.0%) from academic institutions
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords

inverse-problems source-localization

Keywords from Contributors

mesh

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 76% confidence

Mathematics Computer Science - 41% confidence

Last synced: 6 months ago · JSON representation

Repository

Graph Source Localization Library

Basic Info

Host: GitHub
Owner: xianggebenben
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 31.6 MB

Statistics

Stars: 14
Watchers: 2
Forks: 3
Open Issues: 1
Releases: 9

Topics

inverse-problems source-localization

Created almost 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License

GraphSL: Graph Source Localization Library

This is the source code of the GraphSL library to support the research of the graph source localization problem.

Introduction

Problem Definition

Graph diffusion is a fundamental task in graph learning, which aims to predict future information diffusions given information sources. Its inverse problem is graph source localization, which is an extremely important topic even though rarely explored: it focuses on the detection of information sources given their future information diffusions. As illustrated in the above figure, graph diffusion seeks to predict the information diffusion ${b,c,d,e}$ from a source node $b$, whereas graph source localization aims to identify the source node $b$ from the information diffusion ${b,c,d,e}$. Graph source localization spans a broad spectrum of promising research and real-world applications such as rumor detection, tracking of sources for computer viruses, and failure detection in smart grids. Hence, the graph source localization problem demands attention and extensive investigations from machine learning researchers.

Due to its importance, some open-source tools have been developed to support research on the graph source localization problem. Two recent examples are cosasi and RPaSDT. However, they do not support various simulations of information diffusion, and they also miss real-world benchmark datasets and state-of-the-art source localization approaches. To fill this gap, we propose a new library GraphSL: the first one to include real-world benchmark datasets and recent source localization methods to our knowledge, enabling researchers and practitioners to evaluate novel techniques against appropriate baselines easily. These methods do not require prior assumptions about the source (e.g. single source or multiple sources) and can handle graph source localization based on various diffusion simulation models such as Independent Cascade (IC) and Linear Threshold (LT). Our GraphSL library is standardized: for instance, tests of all source inference methods return a Metric object, which provides five performance metrics (accuracy, precision, recall, F-score, and area under ROC curve) for performance evaluation.

Our GraphSL library targets both developers and practical users: they are free to add algorithms and datasets for personal needs by following the guidelines in the "Contact" section of README.md.

Approaches

Existing methods can be categorized into two groups: Prescribed methods and Graph Neural Networks (GNN)-based methods.

Prescribed methods rely on hand-crafted rules and heuristics. For instance, LPSI assumes that nodes surrounded by larger proportions of infected nodes are more likely to be source nodes. NetSleuth employs the Minimum Description Length principle to identify the optimal set of source nodes and virus propagation ripple. OJC identifies a set of nodes (Jordan cover) that cover all observed infected nodes with the minimum radius.

GNN-based methods learn rules from graph data in an end-to-end manner by capturing graph topology and neighboring information. For example, GCNSI utilizes LPSI to enhance input and then applies Graph Convolutional Networks (GCN) for source identification; IVGD introduces a graph residual scenario to make existing graph diffusion models invertible, and it devises a new set of validity-aware layers to project inferred sources to feasible regions. SLVAE uses forward diffusion estimation and deep generative models to approximate source distribution, leveraging prior knowledge for generalization under arbitrary diffusion patterns.

Benchmark Datasets

| Dataset | #Node | #Edge | |:------------------:|:------:|:------:| | Karate | 34 | 78 | | Dolphins | 62 | 159 | | Jazz | 198 | 2,742 | | Network Science | 1,589 | 2,742 | | Cora-ML | 2,810 | 7,981 | | Power Grid | 4,941 | 6,594 |

Aside from methods, we also provide six benchmark datasets to facilitate the research of the graph source localization problem. All datasets are introduced as follows:

Karate: Karate depicts the social ties among members of a university karate club.
Dolphins: Dolphins represents a social network of bottlenose dolphins, with edges indicating frequent associations between dolphins.
Jazz: Jazz illustrates a collaboration network among Jazz musicians, where edges signify instances of playing together in a band.
Network Science: Network Science portrays a coauthorship network of scientists engaged in network theory and experimentation, with each edge representing co-authorship of a paper.
Cora-ML: Cora-ML is a portal network of computer science research papers obtained through machine learning techniques.
Power Grid: Power Grid delineates the topology network of the Western States Power Grid in the United States.

Installation

Install GraphSL using pip:

pip install GraphSL

Or, clone the repo and install requirements:

pip install -r requirements.txt

Quickstart

Now, you can import and use GraphSL in your Python code.

``` python

from GraphSL.GNN.SLVAE.main import SLVAE from GraphSL.GNN.IVGD.main import IVGD from GraphSL.GNN.GCNSI.main import GCNSI from GraphSL.Prescribed import LPSI, NetSleuth, OJC from GraphSL.utils import loaddataset, diffusiongeneration, splitdataset,downloaddataset,visualizesourceprediction import os curr_dir = os.getcwd()

download datasets

downloaddataset(currdir)

load datasets ('karate', 'dolphins', 'jazz', 'netscience', 'coraml', 'powergrid')

dataname = 'karate' graph = loaddataset(dataname, datadir=curr_dir)

generate diffusion

dataset = diffusiongeneration(graph=graph, infectprob=0.3, difftype='IC', simnum=100, seed_ratio=0.2)

split into training and test sets

adj, traindataset, testdataset = split_dataset(dataset)

LPSI

print("LPSI:") lpsi = LPSI()

train LPSI

alpha, thres, auc, f1, pred = lpsi.train(adj, train_dataset) print(f"train auc: {auc:.3f}, train f1: {f1:.3f}")

test LPSI

metric = lpsi.test(adj, test_dataset, alpha, thres) print(f"test acc: {metric.acc:.3f}, test pr: {metric.pr:.3f}, test re: {metric.re:.3f}, test f1: {metric.f1:.3f}, test auc: {metric.auc:.3f}")

NetSleuth

print("NetSleuth:") netSleuth = NetSleuth()

train NetSleuth

k, auc, f1 = netSleuth.train(adj, train_dataset) print(f"train auc: {auc:.3f}, train f1: {f1:.3f}")

test NetSleuth

metric = netSleuth.test(adj, test_dataset, k) print(f"test acc: {metric.acc:.3f}, test pr: {metric.pr:.3f}, test re: {metric.re:.3f}, test f1: {metric.f1:.3f}, test auc: {metric.auc:.3f}")

OJC

print("OJC:") ojc = OJC()

train OJC

Y, auc, f1 = ojc.train(adj, train_dataset) print(f"train auc: {auc:.3f}, train f1: {f1:.3f}")

test OJC

metric = ojc.test(adj, test_dataset, Y) print(f"test acc: {metric.acc:.3f}, test pr: {metric.pr:.3f}, test re: {metric.re:.3f}, test f1: {metric.f1:.3f}, test auc: {metric.auc:.3f}")

GCNSI

print("GCNSI:") gcnsi = GCNSI()

train GCNSI

gcnsimodel, thres, auc, f1, pred = gcnsi.train(adj, traindataset) print(f"train auc: {auc:.3f}, train f1: {f1:.3f}")

visualize training predictions

pred = (pred >= thres) visualizesourceprediction(adj,pred[:,0],traindataset[0][:,0].numpy(),savedir=currdir,savename="GCNSIsourceprediction")

test GCNSI

metric = gcnsi.test(adj, testdataset, gcnsimodel, thres) print(f"test acc: {metric.acc:.3f}, test pr: {metric.pr:.3f}, test re: {metric.re:.3f}, test f1: {metric.f1:.3f}, test auc: {metric.auc:.3f}")

IVGD

print("IVGD:") ivgd = IVGD()

train IVGD diffusion

diffusionmodel = ivgd.traindiffusion(adj, train_dataset)

train IVGD

ivgdmodel, thres, auc, f1, pred = ivgd.train( adj, traindataset, diffusion_model) print(f"train auc: {auc:.3f}, train f1: {f1:.3f}")

visualize training predictions

pred = (pred >= thres) visualizesourceprediction(adj,pred[:,0],traindataset[0][:,0].numpy(),savedir=currdir,savename="IVGDsourceprediction")

test IVGD

metric = ivgd.test(adj, testdataset, diffusionmodel, ivgd_model, thres) print(f"test acc: {metric.acc:.3f}, test pr: {metric.pr:.3f}, test re: {metric.re:.3f}, test f1: {metric.f1:.3f}, test auc: {metric.auc:.3f}")

SLVAE

print("SLVAE:") slave = SLVAE()

train SLVAE

slvaemodel, seedvaetrain, thres, auc, f1, pred = slave.train( adj, traindataset) print(f"train auc: {auc:.3f}, train f1: {f1:.3f}")

visualize training predictions

pred = (pred >= thres) visualizesourceprediction(adj,pred[:,0],traindataset[0][:,0].numpy(),savedir=currdir,savename="SLVAEsourceprediction")

test SLVAE

metric = slave.infer(testdataset, slvaemodel, seedvaetrain, thres) print(f"test acc: {metric.acc:.3f}, test pr: {metric.pr:.3f}, test re: {metric.re:.3f}, test f1: {metric.f1:.3f}, test auc: {metric.auc:.3f}") ```

We also provide a tutorial to help you get started and check the expected results.

Documentation

Official documentation, including a detailed API reference, is available on Read the Docs.

Citation

If you use this package in your research, please consider citing our work as follows: bibtex @article{Wang2024, doi = {10.21105/joss.06796}, url = {https://doi.org/10.21105/joss.06796}, year = {2024}, publisher = {The Open Journal}, volume = {9}, number = {99}, pages = {6796}, author = {Junxiang Wang and Liang Zhao}, title = {GraphSL: An Open-Source Library for Graph Source Localization Approaches and Benchmark Datasets}, journal = {Journal of Open Source Software} }

Contact

We welcome your contributions! If you’d like to contribute your datasets or algorithms, please submit a pull request consisting of an atomic commit and a brief message describing your contribution.

For a new dataset, please upload it to the data folder. The file should be a dictionary object saved by pickle. It contains a key "adj_mat" with the value of a graph adjacency matrix (sprase numpy array with the CSR format).

For a new algorithm, please determine whether it belongs to prescribed methods or GNN-based methods: if it belongs to the prescribed methods, add your algorithm as a new class in the GraphSL/Prescribed.py. Otherwise, please upload it as a folder under the GraphSL/GNN folder. Typically, the algorithm should include a "train" function and a "test" function, and the "test" function should return a Metric object.

Feel free to Email me (junxiang.wang@alumni.emory.edu) if you have any questions. Bug reports and feedback can be directed to the Github issues page.

Version Log

Version 0.11 removes the memetracker and the digg datasets, improves the IVGD method, and creates random seeds for reproducibility.

Version 0.12 adds the datasets downloader.

Version 0.13 adds the visualization of source predictions.

Version 0.14 uses the numthres (i.e. number of thresholds to try) instead of specifying the threslist (i.e. threshold list) for LPSI, GCNSI, IVGD and SLVAE. Moreover, GCNSI, IVGD and SLVAE are improved to run on CUDA if applicable.

Version 0.15 makes all methods run on CUDA if applicable, replaces the diffusion model of IVGD and the encoder of SLVAE, and revises the generation of diffusion.

Owner

Name: Junxiang Wang
Login: xianggebenben
Kind: user
Location: Altanta,GA
Company: NEC Laboratories America

Website: https://xianggebenben.github.io/Junxiang_Wang/
Repositories: 5
Profile: https://github.com/xianggebenben

JOSS Publication

GraphSL: An Open-Source Library for Graph Source Localization Approaches and Benchmark Datasets

Published

July 30, 2024

DOI

10.21105/joss.06796

Volume 9, Issue 99, Page 6796

Authors

Junxiang Wang

Emory University, United States

Liang Zhao
Emory University, United States

Editor

Mark A. Jensen

GitHub Events

Total

Watch event: 7
Delete event: 1
Push event: 2
Pull request event: 1
Fork event: 1
Create event: 1

Last Year

Watch event: 7
Delete event: 1
Push event: 2
Pull request event: 1
Fork event: 1
Create event: 1

Committers

Last synced: 7 months ago

All Time

Total Commits: 255
Total Committers: 5
Avg Commits per committer: 51.0
Development Distribution Score (DDS): 0.408

Past Year

Commits: 4
Committers: 3
Avg Commits per committer: 1.333
Development Distribution Score (DDS): 0.5

Top Committers

Name	Email	Commits
Junxiang Wang	j**g@a**u	151
Junxiang Wang	x**n@1**m	75
junwang	j**g@n**m	16
dependabot[bot]	4****]	12
Martin Beyß	1****s	1

Committer Domains (Top 20 + Academic)

nec-labs.com: 1 163.com: 1 alumni.emory.edu: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 7
Total pull requests: 14
Average time to close issues: 8 days
Average time to close pull requests: 1 day
Total issue authors: 3
Total pull request authors: 2
Average comments per issue: 2.0
Average comments per pull request: 0.07
Merged pull requests: 13
Bot issues: 0
Bot pull requests: 12

Past Year

Issues: 2
Pull requests: 3
Average time to close issues: 3 days
Average time to close pull requests: 14 minutes
Issue authors: 2
Pull request authors: 2
Average comments per issue: 2.0
Average comments per pull request: 0.0
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 2

View more stats

Top Authors

Issue Authors

mbeyss (4)
waityousea (2)
xxTangg (1)

Pull Request Authors

dependabot[bot] (22)
mbeyss (4)

Top Labels

Issue Labels

Pull Request Labels

dependencies (22)

Packages

Total packages: 1
Total downloads:
- pypi 18 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 10
Total maintainers: 1

pypi.org: graphsl

Graph Source Localization Approaches and Benchmark Datasets

Homepage: https://github.com/xianggebenben/GraphSL
Documentation: https://graphsl.readthedocs.io/
License: MIT
Latest release: 0.15
published over 1 year ago

Versions: 10
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 18 Last month

Rankings

Dependent packages count: 9.6%

Average: 36.5%

Dependent repos count: 63.3%

Maintainers (1)

xianggebenben

Last synced: 6 months ago

Dependencies

.github/workflows/draft-pdf.yml actions

actions/checkout v4 composite
actions/upload-artifact v1 composite
openjournals/openjournals-draft-action master composite

requirements.txt pypi

Jinja2 ==3.1.3
MarkupSafe ==2.1.5
PyYAML ==6.0.1
Pygments ==2.17.2
aiohttp ==3.9.3
aiosignal ==1.3.1
asttokens ==2.4.1
attrs ==23.2.0
backcall ==0.2.0
beautifulsoup4 ==4.12.3
bleach ==6.1.0
bokeh ==3.4.0
certifi ==2024.2.2
charset-normalizer ==3.3.2
contourpy ==1.2.0
decorator ==5.1.1
defusedxml ==0.7.1
docopt ==0.6.2
dynetx ==0.3.2
executing ==2.0.1
fastjsonschema ==2.19.1
filelock ==3.13.3
frozenlist ==1.4.1
fsspec ==2024.3.1
future ==1.0.0
idna ==3.6
igraph ==0.11.4
ipdb ==0.13.13
ipython ==8.12.3
jedi ==0.19.1
joblib ==1.3.2
jsonschema ==4.21.1
jsonschema-specifications ==2023.12.1
jupyter_client ==8.6.1
jupyter_core ==5.7.2
jupyterlab_pygments ==0.3.0
matplotlib-inline ==0.1.6
mistune ==3.0.2
mpmath ==1.3.0
multidict ==6.0.5
nbclient ==0.10.0
nbconvert ==7.16.3
nbformat ==5.10.3
ndlib ==5.1.1
netdispatch ==0.1.0
networkx ==3.2.1
numpy ==1.26.4
nvidia-cublas-cu12 ==12.1.3.1
nvidia-cuda-cupti-cu12 ==12.1.105
nvidia-cuda-nvrtc-cu12 ==12.1.105
nvidia-cuda-runtime-cu12 ==12.1.105
nvidia-cudnn-cu12 ==8.9.2.26
nvidia-cufft-cu12 ==11.0.2.54
nvidia-curand-cu12 ==10.3.2.106
nvidia-cusolver-cu12 ==11.4.5.107
nvidia-cusparse-cu12 ==12.1.0.106
nvidia-nccl-cu12 ==2.19.3
nvidia-nvjitlink-cu12 ==12.4.99
nvidia-nvtx-cu12 ==12.1.105
packaging ==24.0
pandas ==2.2.1
pandocfilters ==1.5.1
parso ==0.8.3
pexpect ==4.9.0
pickleshare ==0.7.5
pillow ==10.2.0
pipreqs ==0.5.0
platformdirs ==4.2.0
prompt-toolkit ==3.0.43
psutil ==5.9.8
ptyprocess ==0.7.0
pure-eval ==0.2.2
pyparsing ==3.1.2
python-dateutil ==2.9.0.post0
python-igraph ==0.11.4
pytz ==2024.1
pyzmq ==25.1.2
referencing ==0.34.0
requests ==2.31.0
rpds-py ==0.18.0
scikit-learn ==1.4.1.post1
scipy ==1.12.0
six ==1.16.0
soupsieve ==2.5
stack-data ==0.6.3
sympy ==1.12
texttable ==1.7.0
threadpoolctl ==3.4.0
tinycss2 ==1.2.1
torch ==2.2.2
torch_geometric ==2.5.2
torchaudio ==2.2.2
torchvision ==0.17.2
tornado ==6.4
tqdm ==4.66.2
traitlets ==5.14.2
triton ==2.2.0
typing_extensions ==4.10.0
tzdata ==2024.1
urllib3 ==2.2.1
wcwidth ==0.2.13
webencodings ==0.5.1
xyzservices ==2023.10.1
yarg ==0.1.9
yarl ==1.9.4

GraphSL.egg-info/requires.txt pypi

ndlib *
networkx *
numpy *
scikit-learn *
scipy *
torch *
torch_geometric *

setup.py pypi

get *
ndlib *
networkx *
numpy *
scikit-learn *
scipy *
torch *
torch_geometric *

GraphSL

Science Score: 95.0%

Keywords

Keywords from Contributors

Scientific Fields

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

GraphSL: Graph Source Localization Library

Introduction

Problem Definition

Approaches

Benchmark Datasets

Installation

Quickstart

download datasets

load datasets ('karate', 'dolphins', 'jazz', 'netscience', 'coraml', 'powergrid')

generate diffusion

split into training and test sets

LPSI

train LPSI

test LPSI

NetSleuth

train NetSleuth

test NetSleuth

OJC

train OJC

test OJC

GCNSI

train GCNSI

visualize training predictions

test GCNSI

IVGD

train IVGD diffusion

train IVGD

visualize training predictions

test IVGD

SLVAE

train SLVAE

visualize training predictions

test SLVAE

Documentation

Citation

Contact

Version Log

Owner

JOSS Publication

GraphSL: An Open-Source Library for Graph Source Localization Approaches and Benchmark Datasets

Authors

Editor

Tags

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: graphsl

Rankings

Maintainers (1)

Dependencies