leiden-communities-openmp
Design of OpenMP-based Parallel Leiden algorithm for community detection.
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, nature.com, ieee.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.4%) to scientific vocabulary
Repository
Design of OpenMP-based Parallel Leiden algorithm for community detection.
Basic Info
- Host: GitHub
- Owner: puzzlef
- License: mit
- Language: C++
- Default Branch: main
- Homepage: https://arxiv.org/abs/2312.13936
- Size: 199 KB
Statistics
- Stars: 16
- Watchers: 2
- Forks: 5
- Open Issues: 3
- Releases: 2
Metadata Files
README.md
Design of OpenMP-based Parallel Leiden algorithm for community detection.
Community detection involves identifying subsets of vertices that display higher connectivity within themselves than with the rest of the network. The widely used Louvain method, a heuristic-based approach for community detection, employs a two-phase process consisting of a local-moving phase and an aggregation phase. This iterative optimization targets the modularity metric, a measure of community quality. Despite its popularity, the Louvain method has been observed to generate internally-disconnected and poorly connected communities. In response to these limitations, Traag et al. propose the Leiden algorithm, which introduces a refinement phase between the local-moving and aggregation phases. This refinement phase allows vertices to explore and potentially form sub-communities within the identified communities from the local-moving phase, enabling the Leiden algorithm to identify well-connected communities.
Nevertheless, the original Leiden algorithm encounters computational bottlenecks when applied to massive graphs, primarily due to its inherently sequential nature, akin to the Louvain method. In scenarios where scalability is crucial, the development of an optimized parallel Leiden algorithm becomes essential, especially in the multicore/shared memory setting, given its energy efficiency and the prevalence of hardware with large memory sizes. Despite existing studies proposing various parallelization techniques for the Leiden algorithm, they do not address optimization for the aggregation phase, which emerges as a bottleneck after optimizing the local-moving phase. Additionally, several optimization techniques applicable to the Louvain method are also relevant to the Leiden algorithm. To tackle these challenges, we present GVE-Leiden, an optimized parallel implementation of the Leiden algorithm designed for shared memory multicores.
Below we plot the time taken by the original Leiden, igraph Leiden, NetworKit Leiden, cuGraph Leiden, and GVE-Leiden on 13 different graphs. GVE-Leiden surpasses the original Leiden, igraph Leiden, NetworKit Leiden, and cuGraph Leiden by 436, 104, 8.2, and 3.0 respectively, achieving a processing rate of 403M edges/s on a 3.8 edge graph.
Below we plot the speedup of GVE-Leiden wrt original Leiden, igraph Leiden, NetworKit Leiden, and cuGraph Leiden.
Next, we plot the modularity of communities identified by original Leiden, igraph Leiden, NetworKit Leiden, cuGraph Leiden, and GVE-Leiden. GVE-Leiden on average obtains 0.3% lower modularity than original Leiden and igraph Leiden, 25% higher modularity than NetworKit Leiden (especially on road networks and protein k-mer graphs), and 3.5% higher modularity that cuGraph Leiden (primarily due to cuGraph Leidens inability to run on well-clusterable graphs).
Then, we plot the fraction of disconnected communities obtained with each implementation. Here, the absence of bars signifies no disconnected communities. On average, communities identified by NetworKit Leiden and cuGraph Leiden have fractions of disconnected communities as follows: 1.510^2 and 6.610^5 respectively. None of the communities identified by the original Leiden, igraph Leiden, and GVE-Leiden are internally-disconnected. As the Leiden algorithm guarantees the absence of disconnected communities, those observed with NetworKit Leiden and cuGraph Leiden are likely due to implementation issues.
Refer to our technical report for more details (updated to use constrained merge): \ GVE-Leiden: Fast Leiden Algorithm for Community Detection in Shared Memory Setting.
[!NOTE] You can just copy
main.shto your system and run it. \ For the code, refer tomain.cxx.
Code structure
The code structure of GVE-Leiden is as follows:
bash
- inc/_algorithm.hxx: Algorithm utility functions
- inc/_bitset.hxx: Bitset manipulation functions
- inc/_cmath.hxx: Math functions
- inc/_ctypes.hxx: Data type utility functions
- inc/_cuda.hxx: CUDA utility functions
- inc/_debug.hxx: Debugging macros (LOG, ASSERT, ...)
- inc/_iostream.hxx: Input/output stream functions
- inc/_iterator.hxx: Iterator utility functions
- inc/_main.hxx: Main program header
- inc/_mpi.hxx: MPI (Message Passing Interface) utility functions
- inc/_openmp.hxx: OpenMP utility functions
- inc/_queue.hxx: Queue utility functions
- inc/_random.hxx: Random number generation functions
- inc/_string.hxx: String utility functions
- inc/_utility.hxx: Runtime measurement functions
- inc/_vector.hxx: Vector utility functions
- inc/batch.hxx: Batch update generation functions
- inc/bfs.hxx: Breadth-first search algorithms
- inc/csr.hxx: Compressed Sparse Row (CSR) data structure functions
- inc/dfs.hxx: Depth-first search algorithms
- inc/duplicate.hxx: Graph duplicating functions
- inc/Graph.hxx: Graph data structure functions
- inc/leiden.hxx: Leiden algorithm functions
- inc/louvian.hxx: Louvian algorithm functions
- inc/main.hxx: Main header
- inc/mtx.hxx: Graph file reading functions
- inc/properties.hxx: Graph Property functions
- inc/selfLoop.hxx: Graph Self-looping functions
- inc/symmetricize.hxx: Graph Symmetricization functions
- inc/transpose.hxx: Graph transpose functions
- inc/update.hxx: Update functions
- main.cxx: Experimentation code
- process.js: Node.js script for processing output logs
Note that each branch in this repository contains code for a specific experiment. The main branch contains code for the final experiment. If the intention of a branch in unclear, or if you have comments on our technical report, feel free to open an issue.
References
- Fast unfolding of communities in large networks; Vincent D. Blondel et al. (2008)
- Community Detection on the GPU; Md. Naim et al. (2017)
- Scalable Static and Dynamic Community Detection Using Grappolo; Mahantesh Halappanavar et al. (2017)
- From Louvain to Leiden: guaranteeing well-connected communities; V.A. Traag et al. (2019)
- CS224W: Machine Learning with Graphs | Louvain Algorithm; Jure Leskovec (2021)
- The University of Florida Sparse Matrix Collection; Timothy A. Davis et al. (2011)
- Fetch-and-add using OpenMP atomic operations
Owner
- Name: puzzlef
- Login: puzzlef
- Kind: organization
- Website: https://puzzlef.github.io/
- Repositories: 10
- Profile: https://github.com/puzzlef
A summary of experiments.
GitHub Events
Total
- Issues event: 6
- Watch event: 6
- Issue comment event: 6
- Push event: 2
- Pull request event: 6
- Fork event: 4
Last Year
- Issues event: 6
- Watch event: 6
- Issue comment event: 6
- Push event: 2
- Pull request event: 6
- Fork event: 4
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Subhajit Sahu | w****7@g****m | 62 |
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 4
- Total pull requests: 0
- Average time to close issues: about 14 hours
- Average time to close pull requests: N/A
- Total issue authors: 3
- Total pull request authors: 0
- Average comments per issue: 1.5
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: about 20 hours
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- adsharma (2)
- jgbradley1 (2)
- wolfram77 (1)
- atmughrabi (1)
Pull Request Authors
- FomalhautBell (2)
- adsharma (1)




