https://github.com/barahona-research-group/mcf
Code for the paper "Analysing Multiscale Clusterings with Persistent Homology" by Juni Schindler and Mauricio Barahona.
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.4%) to scientific vocabulary
Repository
Code for the paper "Analysing Multiscale Clusterings with Persistent Homology" by Juni Schindler and Mauricio Barahona.
Basic Info
- Host: GitHub
- Owner: barahona-research-group
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Homepage: https://arxiv.org/abs/2305.04281
- Size: 84.1 MB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 4
Metadata Files
README.md
MCF: Multiscale Clustering Filtration
This repository provides a Python implementation of the Multiscale Clustering Filtration (MCF) to analyse (non-hierarchical) sequences of partitions with persistent homology using gudhi. It is based on the paper "Analysing Multiscale Clusterings with Persistent Homology" by Juni Schindler and Mauricio Barahona: https://arxiv.org/abs/2305.04281.
Installation
Clone the repository and open the folder in your terminal.
zsh
git clone https://github.com/barahona-research-group/MCF.git
cd MCF/
Then, to install the package with pip, execute the following command:
zsh
pip install .
To install the package with support for Multiscale Clustering bi-Filfration (MCbiF), execute instead:
zsh
pip install ."[rivet]"
Using the code
Given a (not necessarily hierarchical) sequence of partitions theta (a list of cluster indices lists) and a list of scales t we can construct the MCF filtration using the MCF class.
```Python from mcf import MCF
initialise MCF object
mcf = MCF(max_dim=3, method="standard)
load sequence of partitions
mcf.load_data(theta, t)
build filtration
mcf.build_filtration() ```
Note that the construction of the MCF (based on gudhi.SimplexTree) can require excessive memory when clusters become too large. When the total number of (distinct) clusters is smaller than the number of points, it can be computationally advantageous to construct the MCF using the equivalent nerve-based construction with method='nerve'. We can then compute the persistent homology of the MCF and also plot the persistence diagram.
```Python
compute persistent homology
mcf.compute_persistence()
plot persistence diagram
ax = mcf.plotpersistencediagram() ```
From the persistent homology we can then compute the measure of persistent hierarchy to quantify the level of hierarchy in the sequence of partitions and the measure of total persistent conflict to quantify the presence of multiscale structure.
```Python
compute persistent hierarchy
h, hbar = mcf.computepersistenthierarchy() print("Average persistent hierarchy:",round(hbar,4))
compute persistent conflict
c1, c2, c = mcf.computepersistentconflict() ```
Our heuristic for scale selection is that robust partitions resolve many conflicts and are thus located at plateaus after dips in the total persistent conflict.
To compute all MCF measures and store them in a dictionary one can simply use the compute_all_measures() method.
```Python
initialise MCF object
mcf = MCF()
load sequence of partitions
mcf.load_data(theta,t)
compute all MCF measures
mcf.computeallmeasures(filepath="mcfresults.pkl",) ```
Experiments
We apply the MCF framework to sequences of partitions corresponding to four different stochastic block models with different intrinsic structure using our sbm module:
- Erdös-Renyi (Er) model: no scale, non-hierarchical
- single-scale stochastic block model (sSBM): 1 scale, hierarchical
- multiscale stochastic block model (mSBM): 3 scales, hierarchical
- non-hierarchical stochastic block model (nh-mSBM): 3 scales, non-hierarchical
To obtain sequences of partitions from the sampled graphs we use the PyGenStability Python package for multiscale clustering with Markov Stability analysis available at: https://github.com/barahona-research-group/PyGenStability
We then analyse the MCF persistence diagrams, persistent hierarchy and persistent conflict to analyse the level of hierarchy and multiscale structures. All scripts and notebooks to reproduce our experiments can be found in the \experiments directory.
Contributors
- Juni Schindler, GitHub:
juni-schindler <https://github.com/juni-schindler>
We always look out for individuals that are interested in contributing to this open-source project. Even if you are just using MCF and made some minor updates, we would be interested in your input.
Cite
Please cite our paper if you use this code in your own work:
@article{schindlerAnalysingMultiscaleClusterings2024,
author = {Schindler, Juni and Barahona, Mauricio},
title = {Analysing Multiscale Clusterings with Persistent Homology},
publisher = {arXiv},
year = {2024},
doi = {10.48550/arXiv.2305.04281},
url = {http://arxiv.org/abs/2305.04281},
}
Licence
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.
Owner
- Name: Barahona Research - Applied Math - Imperial
- Login: barahona-research-group
- Kind: organization
- Email: m.barahona@imperial.ac.uk
- Website: https://scholar.google.co.uk/citations?user=weulBoAAAAAJ&hl=en
- Repositories: 9
- Profile: https://github.com/barahona-research-group
Research codes developed in the Barahona research group - Department of Mathematics - Imperial College London
GitHub Events
Total
- Release event: 2
- Watch event: 1
- Delete event: 1
- Push event: 21
- Pull request event: 2
- Create event: 3
Last Year
- Release event: 2
- Watch event: 1
- Delete event: 1
- Push event: 21
- Pull request event: 2
- Create event: 3