louvain-split-communities-openmp

Design of OpenMP-based Parallel Louvain algorithm for community detection, with no internally-disconnected communities.

https://github.com/puzzlef/louvain-split-communities-openmp

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, nature.com, ieee.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Design of OpenMP-based Parallel Louvain algorithm for community detection, with no internally-disconnected communities.

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Design of OpenMP-based Parallel Louvain algorithm for community detection, \ with no internally-disconnected communities.


Community detection entails the identification of clusters of vertices that exhibit stronger connections within themselves compared to the wider network. The Louvain method, a commonly utilized heuristic for this task, employs a two-step process comprising a local-moving phase and an aggregation phase. This process iteratively optimizes the modularity metric, a measure of community quality. Although widely used, the Louvain method has been noted for generating internally-disconnected and poorly connected communities. In response, Traag et al. introduce the Leiden algorithm, which incorporates an extra refinement phase between the local-moving and aggregation phases. This refinement step enables vertices to explore and potentially create sub-communities within the identified communities from the local-moving phase. In this report, we propose another BFS-based approach, GSP-Louvain, for addressing the same issue. This is different from a number of earlier works, which tackled internally-disconnected communities as a post-processing step.

Below we plot the time taken by the original Leiden, igraph Leiden, NetworKit Leiden, GSP-Louvain on 13 different graphs. GSP-Louvain surpasses the original Leiden, igraph Leiden, and NetworKit Leiden by 341×, 83×, and 6.1× respectively, achieving a processing rate of 328M edges/s on a 3.8𝐵 edge graph.

Below we plot the speedup of GSP-Louvain wrt original Leiden, igraph Leiden, and NetworKit Leiden.

Next, we compare the modularity of communities identified by the original Leiden algorithm, igraph Leiden, NetworKit Leiden, and GSP-Louvain. On average, GSP-Louvain achieves 0.3% lower modularity than the original Leiden and igraph Leiden, respectively, and 25% higher modularity than NetworKit Leiden, particularly evident on road networks and protein k-mer graphs.

Finally, we plot the fraction of disconnected communities identified by each implementation. Absence of bars indicates the absence of disconnected communities. The original Leiden, igraph Leiden, and GSP-Louvain do not identify any communities that are internally-disconnected. However, on average, NetworKit Leiden exhibit fraction of disconnected communities amounting to 1.5×10^−2, particularly on web graphs and social networks.

Refer to our technical reports for more details: \ An Approach for Addressing Internally-Disconnected Communities in Louvain Algorithm.


[!NOTE] You can just copy main.sh to your system and run it. \ For the code, refer to main.cxx.



Code structure

The code structure of GSP-Louvain is as follows:

bash - inc/_algorithm.hxx: Algorithm utility functions - inc/_bitset.hxx: Bitset manipulation functions - inc/_cmath.hxx: Math functions - inc/_ctypes.hxx: Data type utility functions - inc/_cuda.hxx: CUDA utility functions - inc/_debug.hxx: Debugging macros (LOG, ASSERT, ...) - inc/_iostream.hxx: Input/output stream functions - inc/_iterator.hxx: Iterator utility functions - inc/_main.hxx: Main program header - inc/_mpi.hxx: MPI (Message Passing Interface) utility functions - inc/_openmp.hxx: OpenMP utility functions - inc/_queue.hxx: Queue utility functions - inc/_random.hxx: Random number generation functions - inc/_string.hxx: String utility functions - inc/_utility.hxx: Runtime measurement functions - inc/_vector.hxx: Vector utility functions - inc/batch.hxx: Batch update generation functions - inc/bfs.hxx: Breadth-first search algorithms - inc/csr.hxx: Compressed Sparse Row (CSR) data structure functions - inc/dfs.hxx: Depth-first search algorithms - inc/duplicate.hxx: Graph duplicating functions - inc/Graph.hxx: Graph data structure functions - inc/louvian.hxx: Louvian algorithm functions - inc/louvainSplit.hxx: Louvain with no disconnected communities - inc/main.hxx: Main header - inc/mtx.hxx: Graph file reading functions - inc/properties.hxx: Graph Property functions - inc/selfLoop.hxx: Graph Self-looping functions - inc/symmetricize.hxx: Graph Symmetricization functions - inc/transpose.hxx: Graph transpose functions - inc/update.hxx: Update functions - main.cxx: Experimentation code - process.js: Node.js script for processing output logs

Note that each branch in this repository contains code for a specific experiment. The main branch contains code for the final experiment. If the intention of a branch in unclear, or if you have comments on our technical report, feel free to open an issue.



References




ORG DOI

Owner

  • Name: puzzlef
  • Login: puzzlef
  • Kind: organization

A summary of experiments.

Citation (CITATION.bib)

@article{sahu2024approach,
  title={An Approach for Addressing Internally-Disconnected Communities in Louvain Algorithm},
  author={Sahu, Subhajit},
  journal={arXiv preprint arXiv:2402.11454},
  year={2024}
}

GitHub Events

Total
  • Push event: 2
Last Year
  • Push event: 2

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels