rak-split-communities-openmp

Design of OpenMP-based Parallel Label Propagation Algorithm (LPA) for community detection, with no internally-disconnected communities.

https://github.com/puzzlef/rak-split-communities-openmp

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Design of OpenMP-based Parallel Label Propagation Algorithm (LPA) for community detection, with no internally-disconnected communities.

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed 12 months ago
Metadata Files
Readme License Citation

README.md

Design of OpenMP-based Parallel Label Propagation Algorithm (LPA) for community detection, \ with no internally-disconnected communities.

Community detection is the problem of identifying tightly connected clusters of nodes within a network. Efficient parallel algorithms for this play a crucial role in various applications, especially as datasets expand to significant sizes. The Label Propagation Algorithm (LPA) is commonly employed for this purpose due to its ease of parallelization, rapid execution, and scalability. However, it may yield internally disconnected communities. This technical report introduces GSL-LPA, derived from our parallelization of LPA, namely GVE-LPA. Our experiments on a system with two 16-core Intel Xeon Gold 6226R processors show that GSL-LPA not only mitigates this issue but also surpasses FLPA, igraph LPA, and NetworKit LPA by 55x, 10,300x, and 5.8x, respectively, achieving a processing rate of 844M edges/s on a 3.8B edge graph. Additionally, GSL-LPA scales at a rate of 1.6x for every doubling of threads.

Below we plot the time taken by FLPA, igraph LPA, NetworKit LPA, and GSL-LPA on 13 different graphs. GSL-LPA surpasses FLPA, igraph LPA, and NetworKit LPA by 55×, 10,300×, and 5.8× respectively, achieving a processing rate of 844M edges/s on a 3.8𝐵 edge graph.

Below we plot the speedup of GSL-LPA wrt FLPA, igraph LPA, and NetworKit LPA.

Next, we plot the modularity of communities identified by FLPA, igraph LPA, NetworKit LPA, and GSL-LPA. GSL-LPA on average obtains 7.1% / 0.7% higher modularity than FLPA and igraph LPA respectively (especially on road networks and protein k-mer graphs), and 3.6% lower modularity than NetworKit LPA (especially on protein k-mer graphs).

Finally, we plot the fraction of disconnected communities identified by each implementation. Absence of bars indicates the absence of disconnected communities. GSL-LPA does not identify any communities that are internally-disconnected. However, on average, FLPA, igraph LPA, and NetworKit LPA exhibit fraction of disconnected communities amounting to 2.3×10^−3, 1.2×10^−3, and 1.8×10^−2, particularly on web graphs and social networks.

Refer to our technical report for more details: GSL-LPA: Fast Label Propagation Algorithm (LPA) for Community Detection with no Internally-Disconnected Communities.


[!NOTE] You can just copy main.sh to your system and run it. \ For the code, refer to main.cxx.



Code structure

The code structure of GSL-LPA is as follows:

bash - inc/_algorithm.hxx: Algorithm utility functions - inc/_bitset.hxx: Bitset manipulation functions - inc/_cmath.hxx: Math functions - inc/_ctypes.hxx: Data type utility functions - inc/_cuda.hxx: CUDA utility functions - inc/_debug.hxx: Debugging macros (LOG, ASSERT, ...) - inc/_iostream.hxx: Input/output stream functions - inc/_iterator.hxx: Iterator utility functions - inc/_main.hxx: Main program header - inc/_mpi.hxx: MPI (Message Passing Interface) utility functions - inc/_openmp.hxx: OpenMP utility functions - inc/_queue.hxx: Queue utility functions - inc/_random.hxx: Random number generation functions - inc/_string.hxx: String utility functions - inc/_utility.hxx: Runtime measurement functions - inc/_vector.hxx: Vector utility functions - inc/batch.hxx: Batch update generation functions - inc/bfs.hxx: Breadth-first search algorithms - inc/csr.hxx: Compressed Sparse Row (CSR) data structure functions - inc/dfs.hxx: Depth-first search algorithms - inc/duplicate.hxx: Graph duplicating functions - inc/Graph.hxx: Graph data structure functions - inc/rak.hxx: LPA/RAK community detection algorithm functions - inc/rakSplit.hxx: GSL-LPA community detection algorithm functions - inc/split.hxx: Algorithms to split internally-disconnected communities - inc/main.hxx: Main header - inc/mtx.hxx: Graph file reading functions - inc/properties.hxx: Graph Property functions - inc/selfLoop.hxx: Graph Self-looping functions - inc/symmetricize.hxx: Graph Symmetricization functions - inc/transpose.hxx: Graph transpose functions - inc/update.hxx: Update functions - main.cxx: Experimentation code - process.js: Node.js script for processing output logs

Note that each branch in this repository contains code for a specific experiment. The main branch contains code for the final experiment. If the intention of a branch in unclear, or if you have comments on our technical report, feel free to open an issue.



References




ORG

Owner

  • Name: puzzlef
  • Login: puzzlef
  • Kind: organization

A summary of experiments.

Citation (CITATION.bib)

@article{sahu2023gsllpa,
  title={GSL-LPA: Fast Label Propagation Algorithm (LPA) for Community Detection with no Internally-Disconnected Communities},
  author={Sahu, Subhajit},
  journal={arXiv preprint arXiv:2403.01261},
  year={2024}
}

GitHub Events

Total
  • Push event: 1
  • Public event: 1
Last Year
  • Push event: 1
  • Public event: 1

Committers

Last synced: 10 months ago

All Time
  • Total Commits: 40
  • Total Committers: 1
  • Avg Commits per committer: 40.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 3
  • Committers: 1
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Subhajit Sahu w****7@g****m 40

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels