https://github.com/compnet/gpqualmeascomp

Comparison of Graph Pattern Quality Measures

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 35 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.4%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Comparison of Graph Pattern Quality Measures

Basic Info

Host: GitHub
Owner: CompNet
License: gpl-3.0
Language: Python
Default Branch: main
Size: 14.6 MB

Statistics

Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 2

Created almost 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme License

Comparison of Graph Pattern Quality Measures v1.0.1

Description

This repository implements a framework to compare measures used to assess the quality (i.e. discriminative power) of graph patterns in the context of graph classification tasks. It uses a state-of-the-art graph representation and graph classifier to compute a large collection of such measures over several standard graph classification datasets. The detail of this algorithm are described in an article [P'25].

This work was conducted in the framework of the DeCoMaP ANR project (Detection of corruption in public procurement markets -- ANR-19-CE38-0004).

If you use this source code, please cite article [P'25]: bibtex @Article{Potin2025, author = {Potin, Lucas and Figueiredo, Rosa and Labatut, Vincent and Largeron, Christine}, title = {Pattern-Based Graph Classification: Comparison of Quality Measures and Importance of Preprocessing}, journal = {ACM Transactions on Knowledge Discovery from Data (TKDD)}, year = {2025}, volume = {19}, number = {6}, pages = {123} doi = {10.1145/3743143}, }

Content * Organization * Installation * Usage * Dependencies * References

Organization

This repository is composed of the following elements:

requirements.txt: List of required Python packages.
src: folder containing the source code
- ClusteringComparison.py: script that reproduces the experiments of Section 5.2.1. and Section 5.2.3.
- KendallTauHistogram.py: script that reproduces the experiments of Section 5.2.2.
- PairwiseComparisons.py: script that reproduces the experiments of Section 5.3.
- GoldStandardComparison.py: script that reproduces the experiments of Section 5.4.
data: folder containing the input data. Each subfolder corresponds to a distinct dataset, cf. Section Datasets.
results: files produced by the processing.

Installation

Python and Packages

First, you need to install the Python language and the required packages:

Install the Python language
Download this project from GitHub and unzip.
Execute pip install -r requirements.txt to install the required packages (see also Section Dependencies).

Non-Python Dependencies

Second, one of the dependencies, SPMF, is not a Python package, but rather a Java program, and therefore requires a specific installation process:

Download its source code on Philippe Fournier-Viger's website.
Follow the installation instructions provided on the same website.

Note that we use the JAR implementation of SPMF.

Data

We retrieved the datasets from the SPMF website; they include: * MUTAG : MUTAG dataset, representing chemical compounds and their mutagenic properties [D'91] * NCI1 : NCI1 dataset, representing molecules and classified according to carcinogenicity [W'06] * PTC : PTC dataset, representing molecules and classified according to carcinogenicity [T'03] * DD : DD dataset, representing amino acids and their interactions [D'03] * IMDB-Binary : IMDB-Binary dataset, representing movie collaboration graphs [Y'15]

We retrieve two dataset from the TU Dataset website: * AIDS dataset, representing chemical compounds tested for AIDS inhibition [R'08] * FRANKENSTEIN dataset, representing chemical compounds tested and their mutagenic properties [O'15]

The public procurement dataset contains graphs extracted from the FOPPA database, available on Zenodo: * FOPPA : dataset extracted from FOPPA, a database of French public procurement notices [P'23]

Usage

We provide two scripts to reproduces the expriments:

General.sh: reproduces all experiments described in our paper.
OneDataset.sh (dataset): reproduces the experiments concerning the specific dataset.

Each script extracts the data and then performs the associated experiments.

Dependencies

Tested with python version 3.12.2 and the following packages: * pandas: version 2.2.1 * numpy: version 1.26.4 * networkx: version 3.2.1 * sklearn: version 1.2.2 * matplotlib: version 3.8.0 * tqdm: version 4.66.4 * rbo: version 0.1.3 * shap: version 0.45.0 * xgboost: version 2.1.0 * scipy: version 1.11.4

Tested with SPMF version 2.62, which implements gSpan Y'02

References

[D'91] A. S. Debnath, R. L. Lopez, G. Debnath, A. Shusterman, C. Hansch. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity, Journal of Medicinal Chemistry 34(2):786–797, 1991. DOI: 10.1021/jm00106a046
[D'03] P. D. Dobson, A. J. Doig. Distinguishing enzyme structures from non-enzymes without alignments, Journal of Molecular Biology 330(4):771–783, 2003. DOI: 10.1016/S0022-2836(03)00628-4
[H'14'] M. Houbraken, S. Demeyer, T. Michoel, P. Audenaert, D. Colle, M. Pickavet. The Index-Based Subgraph Matching Algorithm with General Symmetries (ISMAGS): Exploiting Symmetry for Faster Subgraph Enumeration, PLoS ONE 9(5):e97896, 2014. DOI: 10.1371/journal.pone.0097896.
[O'15] F. Orsini, P. Frasconi, L. De Raedt. Graph invariant kernels, 24th International Conference on Artificial Intelligence, pp. 3756–3762, 2015. DOI: 10.5555/2832747.2832773
[P'23] L. Potin, V. Labatut, P. H. Morand & C. Largeron. FOPPA: An Open Database of French Public Procurement Award Notices From 2010–2020, Scientific Data, 2023, 10:303. DOI: 10.1038/s41597-023-02213-z
[P'25] L. Potin, V. Labatut, P. H. Morand & C. Largeron. Pattern-Based Graph Classification: Comparison of Quality Measures and Importance of Preprocessing, ACM Transactions on Knowledge Discovery from Data, 2025, 19(6):123. DOI: 10.1145/3743143
[T'03] H. Toivonen, A. Srinivasan, R. D. King, S. Kramer, C. Helma. Statistical evaluation of the predictive toxicology challenge 2000-2001, Bioinformatics 19(10):1183–1193, 2003. DOI: 10.1093/bioinformatics/btg130
[W'06] N. Wale, G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification, 6th International Conference on Data Mining, pp. 678–689, 2006. DOI: 10.1007/s10115-007-0103-5
[Y'02] X. Yan, J. Han. gSpan: Graph-based substructure pattern mining, IEEE International Conference on Data Mining, pp.721-724, 2002. DOI: 10.1109/ICDM.2002.1184038
[Y'15] P. Yanardag, S.V.N. Vishwanathan. Deep Graph Kernels, 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1365–1374, 2015. DOI: 10.1145/2783258.2783417

Owner

Name: Complex Networks
Login: CompNet
Kind: organization
Location: Avignon, France

Website: http://lia.univ-avignon.fr
Repositories: 44
Profile: https://github.com/CompNet

GitHub Events

Total

Release event: 1
Watch event: 1
Push event: 6
Pull request event: 2
Fork event: 1

Last Year

Release event: 1
Watch event: 1
Push event: 6
Pull request event: 2
Fork event: 1

Dependencies

requirements.txt pypi

matplotlib ==3.8.0
networkx ==3.2.1
numpy ==1.26.4
pandas ==2.2.1
rbo ==0.1.3
scikit-learn ==1.2.2
scipy ==1.11.4
shap ==0.45.0
tqdm ==4.66.4
xgboost ==2.1.O

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/compnet/gpqualmeascomp

Science Score: 26.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Comparison of Graph Pattern Quality Measures v1.0.1

Description

Organization

Installation

Python and Packages

Non-Python Dependencies

Data

Usage

Dependencies

References

Owner

GitHub Events

Total

Last Year

Dependencies