CM++ - A Meta-method for Well-Connected Community Detection

CM++ - A Meta-method for Well-Connected Community Detection - Published in JOSS (2024)

https://github.com/illinois-or-research-analytics/cm_pipeline

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 18 DOI reference(s) in README
✓
Academic publication links
Links to: joss.theoj.org, zenodo.org
✓
Committers with academic emails
5 of 11 committers (45.5%) from academic institutions
✓
Institutional organization owner
Organization illinois-or-research-analytics has institutional domain (grainger.illinois.edu)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary

Scientific Fields

Mathematics Computer Science - 40% confidence

Last synced: 9 months ago · JSON representation

Repository

Pipeline that uses an improved version of CM for generating well-connected graph clusters

Basic Info

Host: GitHub
Owner: illinois-or-research-analytics
License: gpl-3.0
Language: Python
Default Branch: main
Homepage:
Size: 60.7 MB

Statistics

Stars: 5
Watchers: 3
Forks: 5
Open Issues: 10
Releases: 16

Created over 3 years ago · Last pushed about 1 year ago

Metadata Files

Readme License

CM++ Pipeline

Customizable modular pipeline for testing an improved version of CM for generating well-connected clusters. Image below from arXiv preprint: Park et. al. (2023). https://github.com/illinois-or-research-analytics/cm_pipeline/tree/main with a GUI now available!

CM++ Pipeline

cm_pipeline Overview Figure: CM Pipeline Overview The Connectivity Modifier pipeline starts with an input network and a clustering algorithm. In the first step, an initial clustering on the entire network is obtained. Afterwards, clusters below size $B$ (default: $B=11$ ) and tree clusters are removed from this initial clustering. On this filtered clustering, each cluster is processed as follows. If a cluster has an edge cut below the threshold (default: $\log_{10}{n}$ where $n$ is the number of nodes in the cluster) the edge cut is removed and the two pieces are re-clustered. This process repeats until all clusters are well-connected. The final processing removes small clusters. Note the user can change the value for $B$ and the threshold for connectivity as these are user-defined. The user has the option to not apply the recursive clustering. See information on CC and WCC.

Documentation

For the full documentation see here

Overview

Main Features

By default, CM modifies an input clustering to ensure that each cluster is well-connected. CM does this by doing rounds of mincut and clustering. It is also possible to run CM in a way that does not recursively cluster (CC and WCC).

Default CM:

CM under default settings of $B=11$ and threshold= $\log_{10}{n}$ where $n$ is the number of nodes in the cluster, meaning remove tree clusters and clusters below size $B$ and also ensure that each cluster has a minimum edge cut size greater than the threshold.