https://github.com/compnet/gpqualmeascomp

Comparison of Graph Pattern Quality Measures

https://github.com/compnet/gpqualmeascomp

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 35 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Comparison of Graph Pattern Quality Measures

Basic Info
  • Host: GitHub
  • Owner: CompNet
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Size: 14.6 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 2
Created almost 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

Comparison of Graph Pattern Quality Measures v1.0.1

Description

This repository implements a framework to compare measures used to assess the quality (i.e. discriminative power) of graph patterns in the context of graph classification tasks. It uses a state-of-the-art graph representation and graph classifier to compute a large collection of such measures over several standard graph classification datasets. The detail of this algorithm are described in an article [P'25].

This work was conducted in the framework of the DeCoMaP ANR project (Detection of corruption in public procurement markets -- ANR-19-CE38-0004).

If you use this source code, please cite article [P'25]: bibtex @Article{Potin2025, author = {Potin, Lucas and Figueiredo, Rosa and Labatut, Vincent and Largeron, Christine}, title = {Pattern-Based Graph Classification: Comparison of Quality Measures and Importance of Preprocessing}, journal = {ACM Transactions on Knowledge Discovery from Data (TKDD)}, year = {2025}, volume = {19}, number = {6}, pages = {123} doi = {10.1145/3743143}, }

Content * Organization * Installation * Usage * Dependencies * References

Organization

This repository is composed of the following elements:

  • requirements.txt: List of required Python packages.
  • src: folder containing the source code
    • ClusteringComparison.py: script that reproduces the experiments of Section 5.2.1. and Section 5.2.3.
    • KendallTauHistogram.py: script that reproduces the experiments of Section 5.2.2.
    • PairwiseComparisons.py: script that reproduces the experiments of Section 5.3.
    • GoldStandardComparison.py: script that reproduces the experiments of Section 5.4.
  • data: folder containing the input data. Each subfolder corresponds to a distinct dataset, cf. Section Datasets.
  • results: files produced by the processing.

Installation

Python and Packages

First, you need to install the Python language and the required packages:

  1. Install the Python language
  2. Download this project from GitHub and unzip.
  3. Execute pip install -r requirements.txt to install the required packages (see also Section Dependencies).

Non-Python Dependencies

Second, one of the dependencies, SPMF, is not a Python package, but rather a Java program, and therefore requires a specific installation process:

Note that we use the JAR implementation of SPMF.

Data

We retrieved the datasets from the SPMF website; they include: * MUTAG : MUTAG dataset, representing chemical compounds and their mutagenic properties [D'91] * NCI1 : NCI1 dataset, representing molecules and classified according to carcinogenicity [W'06] * PTC : PTC dataset, representing molecules and classified according to carcinogenicity [T'03] * DD : DD dataset, representing amino acids and their interactions [D'03] * IMDB-Binary : IMDB-Binary dataset, representing movie collaboration graphs [Y'15]

We retrieve two dataset from the TU Dataset website: * AIDS dataset, representing chemical compounds tested for AIDS inhibition [R'08] * FRANKENSTEIN dataset, representing chemical compounds tested and their mutagenic properties [O'15]

The public procurement dataset contains graphs extracted from the FOPPA database, available on Zenodo: * FOPPA : dataset extracted from FOPPA, a database of French public procurement notices [P'23]

Usage

We provide two scripts to reproduces the expriments:

  • General.sh: reproduces all experiments described in our paper.
  • OneDataset.sh (dataset): reproduces the experiments concerning the specific dataset.

Each script extracts the data and then performs the associated experiments.

Dependencies

Tested with python version 3.12.2 and the following packages: * pandas: version 2.2.1 * numpy: version 1.26.4 * networkx: version 3.2.1 * sklearn: version 1.2.2 * matplotlib: version 3.8.0 * tqdm: version 4.66.4 * rbo: version 0.1.3 * shap: version 0.45.0 * xgboost: version 2.1.0 * scipy: version 1.11.4

Tested with SPMF version 2.62, which implements gSpan Y'02

References

  • [D'91] A. S. Debnath, R. L. Lopez, G. Debnath, A. Shusterman, C. Hansch. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. Correlation with molecular orbital energies and hydrophobicity, Journal of Medicinal Chemistry 34(2):786–797, 1991. DOI: 10.1021/jm00106a046
  • [D'03] P. D. Dobson, A. J. Doig. Distinguishing enzyme structures from non-enzymes without alignments, Journal of Molecular Biology 330(4):771–783, 2003. DOI: 10.1016/S0022-2836(03)00628-4
  • [H'14'] M. Houbraken, S. Demeyer, T. Michoel, P. Audenaert, D. Colle, M. Pickavet. The Index-Based Subgraph Matching Algorithm with General Symmetries (ISMAGS): Exploiting Symmetry for Faster Subgraph Enumeration, PLoS ONE 9(5):e97896, 2014. DOI: 10.1371/journal.pone.0097896.
  • [O'15] F. Orsini, P. Frasconi, L. De Raedt. Graph invariant kernels, 24th International Conference on Artificial Intelligence, pp. 3756–3762, 2015. DOI: 10.5555/2832747.2832773
  • [P'23] L. Potin, V. Labatut, P. H. Morand & C. Largeron. FOPPA: An Open Database of French Public Procurement Award Notices From 2010–2020, Scientific Data, 2023, 10:303. DOI: 10.1038/s41597-023-02213-z
  • [P'25] L. Potin, V. Labatut, P. H. Morand & C. Largeron. Pattern-Based Graph Classification: Comparison of Quality Measures and Importance of Preprocessing, ACM Transactions on Knowledge Discovery from Data, 2025, 19(6):123. DOI: 10.1145/3743143
  • [T'03] H. Toivonen, A. Srinivasan, R. D. King, S. Kramer, C. Helma. Statistical evaluation of the predictive toxicology challenge 2000-2001, Bioinformatics 19(10):1183–1193, 2003. DOI: 10.1093/bioinformatics/btg130
  • [W'06] N. Wale, G. Karypis. Comparison of descriptor spaces for chemical compound retrieval and classification, 6th International Conference on Data Mining, pp. 678–689, 2006. DOI: 10.1007/s10115-007-0103-5
  • [Y'02] X. Yan, J. Han. gSpan: Graph-based substructure pattern mining, IEEE International Conference on Data Mining, pp.721-724, 2002. DOI: 10.1109/ICDM.2002.1184038
  • [Y'15] P. Yanardag, S.V.N. Vishwanathan. Deep Graph Kernels, 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1365–1374, 2015. DOI: 10.1145/2783258.2783417

Owner

  • Name: Complex Networks
  • Login: CompNet
  • Kind: organization
  • Location: Avignon, France

GitHub Events

Total
  • Release event: 1
  • Watch event: 1
  • Push event: 6
  • Pull request event: 2
  • Fork event: 1
Last Year
  • Release event: 1
  • Watch event: 1
  • Push event: 6
  • Pull request event: 2
  • Fork event: 1

Dependencies

requirements.txt pypi
  • matplotlib ==3.8.0
  • networkx ==3.2.1
  • numpy ==1.26.4
  • pandas ==2.2.1
  • rbo ==0.1.3
  • scikit-learn ==1.2.2
  • scipy ==1.11.4
  • shap ==0.45.0
  • tqdm ==4.66.4
  • xgboost ==2.1.O