https://github.com/compnet/gpqualmeascomp
Comparison of Graph Pattern Quality Measures
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 35 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.4%) to scientific vocabulary
Repository
Comparison of Graph Pattern Quality Measures
Basic Info
- Host: GitHub
- Owner: CompNet
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Size: 14.6 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Comparison of Graph Pattern Quality Measures v1.0.1
Description
This repository implements a framework to compare measures used to assess the quality (i.e. discriminative power) of graph patterns in the context of graph classification tasks. It uses a state-of-the-art graph representation and graph classifier to compute a large collection of such measures over several standard graph classification datasets. The detail of this algorithm are described in an article [P'25].
This work was conducted in the framework of the DeCoMaP ANR project (Detection of corruption in public procurement markets -- ANR-19-CE38-0004).
If you use this source code, please cite article [P'25]:
bibtex
@Article{Potin2025,
author = {Potin, Lucas and Figueiredo, Rosa and Labatut, Vincent and Largeron, Christine},
title = {Pattern-Based Graph Classification: Comparison of Quality Measures and Importance of Preprocessing},
journal = {ACM Transactions on Knowledge Discovery from Data (TKDD)},
year = {2025},
volume = {19},
number = {6},
pages = {123}
doi = {10.1145/3743143},
}
Content * Organization * Installation * Usage * Dependencies * References
Organization
This repository is composed of the following elements:
requirements.txt: List of required Python packages.src: folder containing the source codeClusteringComparison.py: script that reproduces the experiments of Section 5.2.1. and Section 5.2.3.KendallTauHistogram.py: script that reproduces the experiments of Section 5.2.2.PairwiseComparisons.py: script that reproduces the experiments of Section 5.3.GoldStandardComparison.py: script that reproduces the experiments of Section 5.4.
data: folder containing the input data. Each subfolder corresponds to a distinct dataset, cf. Section Datasets.results: files produced by the processing.
Installation
Python and Packages
First, you need to install the Python language and the required packages:
- Install the
Pythonlanguage - Download this project from GitHub and unzip.
- Execute
pip install -r requirements.txtto install the required packages (see also Section Dependencies).
Non-Python Dependencies
Second, one of the dependencies, SPMF, is not a Python package, but rather a Java program, and therefore requires a specific installation process:
- Download its source code on Philippe Fournier-Viger's website.
- Follow the installation instructions provided on the same website.
Note that we use the JAR implementation of SPMF.
Data
We retrieved the datasets from the SPMF website; they include:
* MUTAG : MUTAG dataset, representing chemical compounds and their mutagenic properties [D'91]
* NCI1 : NCI1 dataset, representing molecules and classified according to carcinogenicity [W'06]
* PTC : PTC dataset, representing molecules and classified according to carcinogenicity [T'03]
* DD : DD dataset, representing amino acids and their interactions [D'03]
* IMDB-Binary : IMDB-Binary dataset, representing movie collaboration graphs [Y'15]
We retrieve two dataset from the TU Dataset website:
* AIDS dataset, representing chemical compounds tested for AIDS inhibition [R'08]
* FRANKENSTEIN dataset, representing chemical compounds tested and their mutagenic properties [O'15]
The public procurement dataset contains graphs extracted from the FOPPA database, available on Zenodo:
* FOPPA : dataset extracted from FOPPA, a database of French public procurement notices [P'23]
Usage
We provide two scripts to reproduces the expriments:
General.sh: reproduces all experiments described in our paper.OneDataset.sh(dataset): reproduces the experiments concerning the specific dataset.
Each script extracts the data and then performs the associated experiments.
Dependencies
Tested with python version 3.12.2 and the following packages:
* pandas: version 2.2.1
* numpy: version 1.26.4
* networkx: version 3.2.1
* sklearn: version 1.2.2
* matplotlib: version 3.8.0
* tqdm: version 4.66.4
* rbo: version 0.1.3
* shap: version 0.45.0
* xgboost: version 2.1.0
* scipy: version 1.11.4
Tested with SPMF version 2.62, which implements gSpan Y'02
References
- [D'91] A. S. Debnath, R. L. Lopez, G. Debnath, A. Shusterman, C. Hansch. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity, Journal of Medicinal Chemistry 34(2):786–797, 1991. DOI: 10.1021/jm00106a046
- [D'03] P. D. Dobson, A. J. Doig. Distinguishing enzyme structures from non-enzymes without alignments, Journal of Molecular Biology 330(4):771–783, 2003. DOI: 10.1016/S0022-2836(03)00628-4
- [H'14'] M. Houbraken, S. Demeyer, T. Michoel, P. Audenaert, D. Colle, M. Pickavet. The Index-Based Subgraph Matching Algorithm with General Symmetries (ISMAGS): Exploiting Symmetry for Faster Subgraph Enumeration, PLoS ONE 9(5):e97896, 2014. DOI: 10.1371/journal.pone.0097896.
- [O'15] F. Orsini, P. Frasconi, L. De Raedt. Graph invariant kernels, 24th International Conference on Artificial Intelligence, pp. 3756–3762, 2015. DOI: 10.5555/2832747.2832773
- [P'23] L. Potin, V. Labatut, P. H. Morand & C. Largeron. FOPPA: An Open Database of French Public Procurement Award Notices From 2010–2020, Scientific Data, 2023, 10:303. DOI: 10.1038/s41597-023-02213-z
- [P'25] L. Potin, V. Labatut, P. H. Morand & C. Largeron. Pattern-Based Graph Classification: Comparison of Quality Measures and Importance of Preprocessing, ACM Transactions on Knowledge Discovery from Data, 2025, 19(6):123. DOI: 10.1145/3743143
- [T'03] H. Toivonen, A. Srinivasan, R. D. King, S. Kramer, C. Helma. Statistical evaluation of the predictive toxicology challenge 2000-2001, Bioinformatics 19(10):1183–1193, 2003. DOI: 10.1093/bioinformatics/btg130
- [W'06] N. Wale, G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification, 6th International Conference on Data Mining, pp. 678–689, 2006. DOI: 10.1007/s10115-007-0103-5
- [Y'02] X. Yan, J. Han. gSpan: Graph-based substructure pattern mining, IEEE International Conference on Data Mining, pp.721-724, 2002. DOI: 10.1109/ICDM.2002.1184038
- [Y'15] P. Yanardag, S.V.N. Vishwanathan. Deep Graph Kernels, 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1365–1374, 2015. DOI: 10.1145/2783258.2783417
Owner
- Name: Complex Networks
- Login: CompNet
- Kind: organization
- Location: Avignon, France
- Website: http://lia.univ-avignon.fr
- Repositories: 44
- Profile: https://github.com/CompNet
GitHub Events
Total
- Release event: 1
- Watch event: 1
- Push event: 6
- Pull request event: 2
- Fork event: 1
Last Year
- Release event: 1
- Watch event: 1
- Push event: 6
- Pull request event: 2
- Fork event: 1
Dependencies
- matplotlib ==3.8.0
- networkx ==3.2.1
- numpy ==1.26.4
- pandas ==2.2.1
- rbo ==0.1.3
- scikit-learn ==1.2.2
- scipy ==1.11.4
- shap ==0.45.0
- tqdm ==4.66.4
- xgboost ==2.1.O