Recent Releases of leiden-communities-openmp

leiden-communities-openmp - Design of OpenMP-based Parallel Leiden algorithm for community detection, that prevents internally disconnected communities

Design of OpenMP-based Parallel Leiden algorithm for community detection, \ that prevents internally disconnected communities.

[!NOTE] For the code of GVE-Leiden, refer to the arXiv-2312.13936 branch.

Community detection entails the identification of clusters of vertices that exhibit stronger connections within themselves compared to the wider network. The Louvain method, a commonly utilized heuristic for this task, employs a two-step process comprising a local-moving phase and an aggregation phase. This process iteratively optimizes the modularity metric, a measure of community quality. Despite its popularity, the Louvain method has been noted for producing internally fragmented and weakly connected communities. In response to these limitations, Traag et al. propose the Leiden algorithm, which incorporates a refinement phase between the local-moving and aggregation phases. This refinement step enables vertices to explore and potentially establish sub-communities within the identified communities from the local-moving phase.

However, the Leiden algorithm is not guaranteed to avoid internally disconnected communities, a flaw that has largely escaped attention. We illustrate this through both a counterexample and empirical findings. In our experimental evaluation, we note that approximately 1.3×10^−4 fraction of the communities identified using the original Leiden implementation exhibit this issue. Although this fraction is small, addressing the presence of disconnected communities is crucial for ensuring the accuracy and dependability of community detection algorithms. Several studies have addressed internally disconnected communities as a post-processing step. However, this may exacerbate the problem of poorly connected communities. Furthermore, the surge in data volume and their graph representations in recent years has been unprecedented. Nonetheless, applying the original Leiden algorithm to massive graphs has posed computational hurdles, primarily due to its inherently sequential nature, akin to the Louvain method. To tackle these challenged, we propose two new parallel algorithms: GSP-Leiden and GSP-Louvain, based on the Leiden and Louvain algorithms, respectively.

Below we plot the time taken by the original Leiden, igraph Leiden, NetworKit Leiden, GSP-Leiden, and GSP-Louvain on 13 different graphs. GSP-Leiden surpasses the original Leiden, igraph Leiden, and NetworKit Leiden by 190×, 46×, and 3.4× respectively, achieving a processing rate of 195M edges/s on a 3.8𝐵 edge graph.

Below we plot the speedup of GSP-Leiden and GSP-Louvain wrt original Leiden, igraph Leiden, and NetworKit Leiden.

Next, we compare the modularity of communities identified by the original Leiden algorithm, igraph Leiden, NetworKit Leiden, GSP-Leiden, and GSP-Leiden. On average, GSP-Leiden achieves 0.07% and 0.02% lower modularity than the original Leiden and igraph Leiden, respectively, and 26% higher modularity than NetworKit Leiden, particularly evident on road networks and protein k-mer graphs.

Finally, we plot the fraction of disconnected communities identified by each implementation. Absence of bars indicates the absence of disconnected communities. As anticipated, both GSP-Leiden and GSP-Louvain detect no disconnected communities. However, on average, the original Leiden, igraph Leiden, and NetworKit Leiden exhibit fractions of disconnected communities amounting to 1.3×10^−4, 7.9×10^−5, and 1.5×10^−2, respectively, particularly on web graphs (and especially on social networks with NetworKit Leiden).

Refer to our technical reports for more details: \ GVE-Leiden: Fast Leiden Algorithm for Community Detection in Shared Memory Setting. \ Addressing Internally-Disconnected Communities in Leiden and Louvain Community Detection Algorithms.

[!NOTE] You can just copy main.sh to your system and run it. \ For the code, refer to main.cxx.

Code structure

The code structure of GVE-Leiden is as follows:

bash - inc/_algorithm.hxx: Algorithm utility functions - inc/_bitset.hxx: Bitset manipulation functions - inc/_cmath.hxx: Math functions - inc/_ctypes.hxx: Data type utility functions - inc/_cuda.hxx: CUDA utility functions - inc/_debug.hxx: Debugging macros (LOG, ASSERT, ...) - inc/_iostream.hxx: Input/output stream functions - inc/_iterator.hxx: Iterator utility functions - inc/_main.hxx: Main program header - inc/_mpi.hxx: MPI (Message Passing Interface) utility functions - inc/_openmp.hxx: OpenMP utility functions - inc/_queue.hxx: Queue utility functions - inc/_random.hxx: Random number generation functions - inc/_string.hxx: String utility functions - inc/_utility.hxx: Runtime measurement functions - inc/_vector.hxx: Vector utility functions - inc/batch.hxx: Batch update generation functions - inc/bfs.hxx: Breadth-first search algorithms - inc/csr.hxx: Compressed Sparse Row (CSR) data structure functions - inc/dfs.hxx: Depth-first search algorithms - inc/duplicate.hxx: Graph duplicating functions - inc/Graph.hxx: Graph data structure functions - inc/leiden.hxx: Leiden algorithm functions - inc/leidenSplit.hxx: Leiden with no disconnected communities - inc/louvian.hxx: Louvian algorithm functions - inc/louvainSplit.hxx: Louvain with no disconnected communities - inc/main.hxx: Main header - inc/mtx.hxx: Graph file reading functions - inc/properties.hxx: Graph Property functions - inc/selfLoop.hxx: Graph Self-looping functions - inc/symmetricize.hxx: Graph Symmetricization functions - inc/transpose.hxx: Graph transpose functions - inc/update.hxx: Update functions - main.cxx: Experimentation code - process.js: Node.js script for processing output logs

Note that each branch in this repository contains code for a specific experiment. The main branch contains code for the final experiment. If the intention of a branch in unclear, or if you have comments on our technical report, feel free to open an issue.

References

- C++
Published by wolfram77 over 2 years ago

leiden-communities-openmp - Design of OpenMP-based Parallel Leiden algorithm for community detection

Design of OpenMP-based Leiden algorithm for community detection.

Community detection involves identifying subsets of vertices that display higher connectivity within themselves than with the rest of the network. The widely used Louvain method, a heuristic-based approach for community detection, employs a two-phase process consisting of a local-moving phase and an aggregation phase. This iterative optimization targets the modularity metric, a measure of community quality. Despite its popularity, the Louvain method has been observed to generate internally-disconnected and poorly connected communities. In response to these limitations, Traag et al. propose the Leiden algorithm, which introduces a refinement phase between the local-moving and aggregation phases. This refinement phase allows vertices to explore and potentially form sub-communities within the identified communities from the local-moving phase, enabling the Leiden algorithm to identify well-connected communities.

Nevertheless, the original Leiden algorithm encounters computational bottlenecks when applied to massive graphs, primarily due to its inherently sequential nature, akin to the Louvain method. In scenarios where scalability is crucial, the development of an optimized parallel Leiden algorithm becomes essential, especially in the multicore/shared memory setting, given its energy efficiency and the prevalence of hardware with large memory sizes. Despite existing studies proposing various parallelization techniques for the Leiden algorithm, they do not address optimization for the aggregation phase, which emerges as a bottleneck after optimizing the local-moving phase. Additionally, several optimization techniques applicable to the Louvain method are also relevant to the Leiden algorithm. To tackle these challenges, we present GVE-Leiden, an optimized parallel implementation of the Leiden algorithm designed for shared memory multicores.

Below we plot the time taken by the original Leiden, igraph Leiden, NetworKit Leiden, and GVE-Leiden on 13 different graphs. GVE-Leiden surpasses the original Leiden, igraph Leiden, and NetworKit Leiden by 373×, 86×, and 7.2× respectively, achieving a processing rate of 1.4𝐵 edges/s on a 3.8𝐵 edge graph.

Below we plot the speedup of GVE-Leiden wrt original Leiden, igraph Leiden, and NetworKit Leiden.

Next, we plot the modularity of communities identified by original Leiden, igraph Leiden, NetworKit Leiden, and GVE-Leiden. GVE-Leiden on average obtains 0.1% lower modularity than original Leiden and igraph Leiden, and 26% higher modularity than NetworKit Leiden (especially on road networks and protein k-mer graphs).

Then, we plot the fraction of disconnected communities obtained with each implementation. Here, the absence of bars indicates the absence of disconnected communities. Communities identified by GVE-Leiden on average have 88×, 145×, and 0.76× disconnected communities than the original Leiden, igraph Leiden, and NetworKit Leiden respectively. While this compares unfavorably with the original Leiden and igraph Leiden (especially on social networks, road networks, and protein k-mer graphs), it may be simpler to split the disconnected communities obtained from GVE-Leiden as a post-processing step. We would like to address this issue some time in the future.

Finally, we plot the strong scaling behaviour of GVE-Leiden. With doubling of threads, GVE-Leiden exhibits an average performance scaling of 1.6×.

Refer to our technical report for more details: GVE-Leiden: Fast Leiden Algorithm for Community Detection in Shared Memory Setting.

[!NOTE] You can just copy main.sh to your system and run it. \ For the code, refer to main.cxx.

Code structure

The code structure of GVE-Leiden is as follows:

bash - inc/_algorithm.hxx: Algorithm utility functions - inc/_bitset.hxx: Bitset manipulation functions - inc/_cmath.hxx: Math functions - inc/_ctypes.hxx: Data type utility functions - inc/_cuda.hxx: CUDA utility functions - inc/_debug.hxx: Debugging macros (LOG, ASSERT, ...) - inc/_iostream.hxx: Input/output stream functions - inc/_iterator.hxx: Iterator utility functions - inc/_main.hxx: Main program header - inc/_mpi.hxx: MPI (Message Passing Interface) utility functions - inc/_openmp.hxx: OpenMP utility functions - inc/_queue.hxx: Queue utility functions - inc/_random.hxx: Random number generation functions - inc/_string.hxx: String utility functions - inc/_utility.hxx: Runtime measurement functions - inc/_vector.hxx: Vector utility functions - inc/batch.hxx: Batch update generation functions - inc/bfs.hxx: Breadth-first search algorithms - inc/csr.hxx: Compressed Sparse Row (CSR) data structure functions - inc/dfs.hxx: Depth-first search algorithms - inc/duplicate.hxx: Graph duplicating functions - inc/Graph.hxx: Graph data structure functions - inc/leiden.hxx: Leiden algorithm functions - inc/louvian.hxx: Louvian algorithm functions - inc/main.hxx: Main header - inc/mtx.hxx: Graph file reading functions - inc/properties.hxx: Graph Property functions - inc/selfLoop.hxx: Graph Self-looping functions - inc/symmetricize.hxx: Graph Symmetricization functions - inc/transpose.hxx: Graph transpose functions - inc/update.hxx: Update functions - main.cxx: Experimentation code - process.js: Node.js script for processing output logs

References

- C++
Published by wolfram77 over 2 years ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

Recent Releases of leiden-communities-openmp

leiden-communities-openmp - Design of OpenMP-based Parallel Leiden algorithm for community detection, that prevents internally disconnected communities

Code structure

References

leiden-communities-openmp - Design of OpenMP-based Parallel Leiden algorithm for community detection

Code structure

References