https://github.com/barahona-research-group/mcf

Code for the paper "Analysing Multiscale Clusterings with Persistent Homology" by Juni Schindler and Mauricio Barahona.

https://github.com/barahona-research-group/mcf

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Code for the paper "Analysing Multiscale Clusterings with Persistent Homology" by Juni Schindler and Mauricio Barahona.

Basic Info
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 4
Created almost 4 years ago · Last pushed 10 months ago
Metadata Files
Readme License

README.md

DOI

MCF: Multiscale Clustering Filtration

This repository provides a Python implementation of the Multiscale Clustering Filtration (MCF) to analyse (non-hierarchical) sequences of partitions with persistent homology using gudhi. It is based on the paper "Analysing Multiscale Clusterings with Persistent Homology" by Juni Schindler and Mauricio Barahona: https://arxiv.org/abs/2305.04281.

Installation

Clone the repository and open the folder in your terminal.

zsh git clone https://github.com/barahona-research-group/MCF.git cd MCF/

Then, to install the package with pip, execute the following command:

zsh pip install .

To install the package with support for Multiscale Clustering bi-Filfration (MCbiF), execute instead:

zsh pip install ."[rivet]"

Using the code

Given a (not necessarily hierarchical) sequence of partitions theta (a list of cluster indices lists) and a list of scales t we can construct the MCF filtration using the MCF class.

```Python from mcf import MCF

initialise MCF object

mcf = MCF(max_dim=3, method="standard)

load sequence of partitions

mcf.load_data(theta, t)

build filtration

mcf.build_filtration() ```

Note that the construction of the MCF (based on gudhi.SimplexTree) can require excessive memory when clusters become too large. When the total number of (distinct) clusters is smaller than the number of points, it can be computationally advantageous to construct the MCF using the equivalent nerve-based construction with method='nerve'. We can then compute the persistent homology of the MCF and also plot the persistence diagram.

```Python

compute persistent homology

mcf.compute_persistence()

plot persistence diagram

ax = mcf.plotpersistencediagram() ```

From the persistent homology we can then compute the measure of persistent hierarchy to quantify the level of hierarchy in the sequence of partitions and the measure of total persistent conflict to quantify the presence of multiscale structure.

```Python

compute persistent hierarchy

h, hbar = mcf.computepersistenthierarchy() print("Average persistent hierarchy:",round(hbar,4))

compute persistent conflict

c1, c2, c = mcf.computepersistentconflict() ```

Our heuristic for scale selection is that robust partitions resolve many conflicts and are thus located at plateaus after dips in the total persistent conflict.

To compute all MCF measures and store them in a dictionary one can simply use the compute_all_measures() method.

```Python

initialise MCF object

mcf = MCF()

load sequence of partitions

mcf.load_data(theta,t)

compute all MCF measures

mcf.computeallmeasures(filepath="mcfresults.pkl",) ```

Experiments

We apply the MCF framework to sequences of partitions corresponding to four different stochastic block models with different intrinsic structure using our sbm module:

  • Erdös-Renyi (Er) model: no scale, non-hierarchical
  • single-scale stochastic block model (sSBM): 1 scale, hierarchical
  • multiscale stochastic block model (mSBM): 3 scales, hierarchical
  • non-hierarchical stochastic block model (nh-mSBM): 3 scales, non-hierarchical

To obtain sequences of partitions from the sampled graphs we use the PyGenStability Python package for multiscale clustering with Markov Stability analysis available at: https://github.com/barahona-research-group/PyGenStability

We then analyse the MCF persistence diagrams, persistent hierarchy and persistent conflict to analyse the level of hierarchy and multiscale structures. All scripts and notebooks to reproduce our experiments can be found in the \experiments directory.

Contributors

  • Juni Schindler, GitHub: juni-schindler <https://github.com/juni-schindler>

We always look out for individuals that are interested in contributing to this open-source project. Even if you are just using MCF and made some minor updates, we would be interested in your input.

Cite

Please cite our paper if you use this code in your own work:

@article{schindlerAnalysingMultiscaleClusterings2024, author = {Schindler, Juni and Barahona, Mauricio}, title = {Analysing Multiscale Clusterings with Persistent Homology}, publisher = {arXiv}, year = {2024}, doi = {10.48550/arXiv.2305.04281}, url = {http://arxiv.org/abs/2305.04281}, }

Licence

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

Owner

  • Name: Barahona Research - Applied Math - Imperial
  • Login: barahona-research-group
  • Kind: organization
  • Email: m.barahona@imperial.ac.uk

Research codes developed in the Barahona research group - Department of Mathematics - Imperial College London

GitHub Events

Total
  • Release event: 2
  • Watch event: 1
  • Delete event: 1
  • Push event: 21
  • Pull request event: 2
  • Create event: 3
Last Year
  • Release event: 2
  • Watch event: 1
  • Delete event: 1
  • Push event: 21
  • Pull request event: 2
  • Create event: 3