louvain-communities-openmp - Design of OpenMP-based Parallel Louvain algorithm for community detection, that prevents internally disconnected communities

Design of OpenMP-based Parallel Louvain algorithm for community detection, \ that prevents internally disconnected communities.

[!NOTE] For the code of GVE-Louvain, refer to the arXiv-2312.04876 branch.

Community detection entails the identification of clusters of vertices that exhibit stronger connections within themselves compared to the wider network. The Louvain method, a commonly utilized heuristic for this task, employs a two-step process comprising a local-moving phase and an aggregation phase. This process iteratively optimizes the modularity metric, a measure of community quality. Despite its popularity, the Louvain method has been noted for producing internally fragmented and weakly connected communities. In response to these limitations, Traag et al. propose the Leiden algorithm, which incorporates a refinement phase between the local-moving and aggregation phases. This refinement step enables vertices to explore and potentially establish sub-communities within the identified communities from the local-moving phase.

However, the Leiden algorithm is not guaranteed to avoid internally disconnected communities, a flaw that has largely escaped attention. We illustrate this through both a counterexample and empirical findings. In our experimental evaluation, we note that approximately 1.3×10^−4 fraction of the communities identified using the original Leiden implementation exhibit this issue. Although this fraction is small, addressing the presence of disconnected communities is crucial for ensuring the accuracy and dependability of community detection algorithms. Several studies have addressed internally disconnected communities as a post-processing step. However, this may exacerbate the problem of poorly connected communities. Furthermore, the surge in data volume and their graph representations in recent years has been unprecedented. Nonetheless, applying the original Leiden algorithm to massive graphs has posed computational hurdles, primarily due to its inherently sequential nature, akin to the Louvain method. To tackle these challenged, we propose two new parallel algorithms: GSP-Leiden and GSP-Louvain, based on the Leiden and Louvain algorithms, respectively.

Below we plot the time taken by the original Leiden, igraph Leiden, NetworKit Leiden, GSP-Leiden, and GSP-Louvain on 13 different graphs. GSP-Leiden surpasses the original Leiden, igraph Leiden, and NetworKit Leiden by 341×, 83×, and 6.1× respectively, achieving a processing rate of 328M edges/s on a 3.8𝐵 edge graph.

Below we plot the speedup of GSP-Leiden and GSP-Louvain wrt original Leiden, igraph Leiden, and NetworKit Leiden.

Next, we compare the modularity of communities identified by the original Leiden algorithm, igraph Leiden, NetworKit Leiden, GSP-Leiden, and GSP-Leiden. On average, GSP-Louvain achieves 0.3% lower modularity than the original Leiden and igraph Leiden, respectively, and 25% higher modularity than NetworKit Leiden, particularly evident on road networks and protein k-mer graphs.

Finally, we plot the fraction of disconnected communities identified by each implementation. Absence of bars indicates the absence of disconnected communities. As anticipated, both GSP-Leiden and GSP-Louvain detect no disconnected communities. However, on average, the original Leiden, igraph Leiden, and NetworKit Leiden exhibit fractions of disconnected communities amounting to 1.3×10^−4, 7.9×10^−5, and 1.5×10^−2, respectively, particularly on web graphs (and especially on social networks with NetworKit Leiden).

Refer to our technical reports for more details: \ GVE-Louvain: Fast Louvain Algorithm for Community Detection in Shared Memory Setting. \ Addressing Internally-Disconnected Communities in Leiden and Louvain Community Detection Algorithms.

[!NOTE] You can just copy main.sh to your system and run it. \ For the code, refer to main.cxx.

Code structure

The code structure of GVE-Leiden is as follows:

bash - inc/_algorithm.hxx: Algorithm utility functions - inc/_bitset.hxx: Bitset manipulation functions - inc/_cmath.hxx: Math functions - inc/_ctypes.hxx: Data type utility functions - inc/_cuda.hxx: CUDA utility functions - inc/_debug.hxx: Debugging macros (LOG, ASSERT, ...) - inc/_iostream.hxx: Input/output stream functions - inc/_iterator.hxx: Iterator utility functions - inc/_main.hxx: Main program header - inc/_mpi.hxx: MPI (Message Passing Interface) utility functions - inc/_openmp.hxx: OpenMP utility functions - inc/_queue.hxx: Queue utility functions - inc/_random.hxx: Random number generation functions - inc/_string.hxx: String utility functions - inc/_utility.hxx: Runtime measurement functions - inc/_vector.hxx: Vector utility functions - inc/batch.hxx: Batch update generation functions - inc/bfs.hxx: Breadth-first search algorithms - inc/csr.hxx: Compressed Sparse Row (CSR) data structure functions - inc/dfs.hxx: Depth-first search algorithms - inc/duplicate.hxx: Graph duplicating functions - inc/Graph.hxx: Graph data structure functions - inc/louvian.hxx: Louvian algorithm functions - inc/louvainSplit.hxx: Louvain with no disconnected communities - inc/main.hxx: Main header - inc/mtx.hxx: Graph file reading functions - inc/properties.hxx: Graph Property functions - inc/selfLoop.hxx: Graph Self-looping functions - inc/symmetricize.hxx: Graph Symmetricization functions - inc/transpose.hxx: Graph transpose functions - inc/update.hxx: Update functions - main.cxx: Experimentation code - process.js: Node.js script for processing output logs

Note that each branch in this repository contains code for a specific experiment. The main branch contains code for the final experiment. If the intention of a branch in unclear, or if you have comments on our technical report, feel free to open an issue.

References

- C++
Published by wolfram77 about 2 years ago

louvain-communities-openmp - Design of OpenMP-based Louvain algorithm for community detection

Multi-threaded OpenMP-based Louvain algorithm for community detection.

Recent advancements in data collection and graph representations have led to unprecedented levels of complexity, demanding efficient parallel algorithms for community detection on large networks. The use of multicore/shared memory setups is crucial for energy efficiency and compatibility with extensive DRAM sizes. However, existing community detection algorithms face challenges in parallelization due to their irregular and inherently sequential nature. While studies on the Louvain algorithm propose optimizations and parallelization techniques, they often neglect the aggregation phase, creating a bottleneck even after optimizing the local-moving phase. Additionally, these optimization techniques are scattered across multiple papers, making it challenging for readers to grasp and implement them effectively. To address this, we introduce GVE-Louvain, an optimized parallel implementation of Louvain for shared memory multicores.

Below we plot the time taken by Vite, Grappolo, NetworKit Louvain, and GVE-Louvain on 13 different graphs. GVE-Louvain surpasses Vite, Grappolo, and NetworKit by 50×, 22×, and 20× respectively, achieving a processing rate of 560𝑀 edges/s on a 3.8𝐵 edge graph.

Below we plot the speedup of GVE-Louvain wrt Vite, Grappolo, and NetworKit.

Next, we plot the modularity of communities identified by Vite, Grappolo, NetworKit, and GVE-Louvain. GVE-Louvain on average obtains 3.1% higher modularity than Vite (especially on web graphs), and 0.6% lower modularity than Grappolo and NetworKit (especially on social networks with poor clustering).

Finally, we plot the strong scaling behaviour of GVE-Louvain. With doubling of threads, GVE-Louvain exhibits an average performance scaling of 1.6×.

Refer to our technical report for more details: \ GVE-Louvain: Fast Louvain Algorithm for Community Detection in Shared Memory Setting.

[!NOTE] You can just copy main.sh to your system and run it. \ For the code, refer to main.cxx.

Code structure

The code structure of GVE-Louvain is as follows:

bash - inc/_algorithm.hxx: Algorithm utility functions - inc/_bitset.hxx: Bitset manipulation functions - inc/_cmath.hxx: Math functions - inc/_ctypes.hxx: Data type utility functions - inc/_cuda.hxx: CUDA utility functions - inc/_debug.hxx: Debugging macros (LOG, ASSERT, ...) - inc/_iostream.hxx: Input/output stream functions - inc/_iterator.hxx: Iterator utility functions - inc/_main.hxx: Main program header - inc/_mpi.hxx: MPI (Message Passing Interface) utility functions - inc/_openmp.hxx: OpenMP utility functions - inc/_queue.hxx: Queue utility functions - inc/_random.hxx: Random number generation functions - inc/_string.hxx: String utility functions - inc/_utility.hxx: Runtime measurement functions - inc/_vector.hxx: Vector utility functions - inc/batch.hxx: Batch update generation functions - inc/bfs.hxx: Breadth-first search algorithms - inc/csr.hxx: Compressed Sparse Row (CSR) data structure functions - inc/dfs.hxx: Depth-first search algorithms - inc/duplicate.hxx: Graph duplicating functions - inc/Graph.hxx: Graph data structure functions - inc/louvain.hxx: Louvain community detection algorithm functions - inc/main.hxx: Main header - inc/mtx.hxx: Graph file reading functions - inc/properties.hxx: Graph Property functions - inc/selfLoop.hxx: Graph Self-looping functions - inc/symmetricize.hxx: Graph Symmetricization functions - inc/transpose.hxx: Graph transpose functions - inc/update.hxx: Update functions - main.cxx: Experimentation code - process.js: Node.js script for processing output logs

Note that each branch in this repository contains code for a specific experiment. The main branch contains code for the final experiment. If the intention of a branch in unclear, or if you have comments on our technical report, feel free to open an issue.

References

- C++
Published by wolfram77 about 2 years ago

louvain-communities-openmp - Effect of adjusting schedule in OpenMP-based Louvain algorithm for community detection

Effect of adjusting schedule in OpenMP-based Louvain algorithm for community detection.

Louvain is an algorithm for detecting communities in graphs. Community detection helps us understand the natural divisions in a network in an unsupervised manner. It is used in e-commerce for customer segmentation and advertising, in communication networks for multicast routing and setting up of mobile networks, and in healthcare for epidemic causality, setting up health programmes, and fraud detection is hospitals. Community detection is an NP-hard problem, but heuristics exist to solve it (such as this). Louvain algorithm is an agglomerative-hierarchical community detection method that greedily optimizes for modularity (iteratively).

Modularity is a score that measures relative density of edges inside vs outside communities. Its value lies between −0.5 (non-modular clustering) and 1.0 (fully modular clustering). Optimizing for modularity theoretically results in the best possible grouping of nodes in a graph.

Given an undirected weighted graph, all vertices are first considered to be their own communities. In the first phase, each vertex greedily decides to move to the community of one of its neighbors which gives greatest increase in modularity. If moving to no neighbor's community leads to an increase in modularity, the vertex chooses to stay with its own community. This is done sequentially for all the vertices. If the total change in modularity is more than a certain threshold, this phase is repeated. Once this local-moving phase is complete, all vertices have formed their first hierarchy of communities. The next phase is called the aggregation phase, where all the vertices belonging to a community are collapsed into a single super-vertex, such that edges between communities are represented as edges between respective super-vertices (edge weights are combined), and edges within each community are represented as self-loops in respective super-vertices (again, edge weights are combined). Together, the local-moving and the aggregation phases constitute a pass. This super-vertex graph is then used as input for the next pass. This process continues until the increase in modularity is below a certain threshold. As a result from each pass, we have a hierarchy of community memberships for each vertex as a dendrogram. We generally consider the top-level hierarchy as the final result of community detection process.

There exist two possible approaches of vertex processing with the Louvain algorithm: ordered and unordered. With the ordered approach (original paper's approach), the local-moving phase is performed sequentially upon each vertex such that the moving of a previous vertex in the graph affects the decision of the current vertex being processed. On the other hand, with the unordered approach the moving of a previous vertex in the graph does not affect the decision of movement for the current vertex. This unordered approach (aka relaxed approach) is made possible by maintaining the previous and the current community membership status of each vertex (along with associated community information), and is the approach followed by parallel Louvain implementation on the CPU as well as the GPU. We are interested in looking at performance/modularity penalty (if any) associated with the unordered approach.

In this experiment we attempt performing the OpenMP-based Louvain algorithm using the ordered approach with all different schedule kinds (static, dynamic, guided, and auto), which adjusting the chunk size from 1 to 65536 in multiples of 2. In all cases, we use a total of 12 threads. We choose the Louvain parameters as resolution = 1.0, tolerance = 1e-2 (for local-moving phase) with tolerance decreasing after every pass by a factor of toleranceDeclineFactor = 10, and a passTolerance = 0.0 (when passes stop). In addition we limit the maximum number of iterations in a single local-moving phase with maxIterations = 500, and limit the maximum number of passes with maxPasses = 500. We run the Louvain algorithm until convergence (or until the maximum limits are exceeded), and measure the time taken for the computation (performed 5 times for averaging), the modularity score, the total number of iterations (in the local-moving phase), and the number of passes. This is repeated for seventeen different graphs.

From the results, we observe that little to no benefit is provided with a multi-threaded OpenMP-based ordered Louvain algorithm, compared to the sequential ordered approach. This is somewhat surprising (not surprising to me) and indicates that when multiple reader threads and a writer thread access common memory locations (here it is the community membership of each vertex) perfomance is degraded and would tend to approach the performance of a sequential algorithm if there is just too much overlap. What should you do in that case? A simple approach would be to just go ahead with the sequential (ordered) Louvain algorithm, or partition the graph is such a way that each partition can be run independently, and then combined back together. We could also try this same experiment with the unordered approach, and see if it performs better than the ordered sequential approach (the ordered approach is an order of magnitude slower than the unordered approach, but is easily parallelizable). Partially ordered approaches for vertex processing may also be explored. Vertex ordering via graph coloring has been explored by Halappanavar et al.

All outputs are saved in a gist and a small part of the output is listed here. Some charts are also included below, generated from sheets. The input data used for this experiment is available from the SuiteSparse Matrix Collection. This experiment was done with guidance from Prof. Kishore Kothapalli and Prof. Dip Sankar Banerjee.

```bash $ g++ -std=c++17 -O3 main.cxx $ ./a.out ~/data/web-Stanford.mtx $ ./a.out ~/data/web-BerkStan.mtx $ ...

Loading graph /home/subhajit/data/web-Stanford.mtx ...

order: 281903 size: 2312497 [directed] {}

order: 281903 size: 3985272 [directed] {} (symmetricize)

OMPNUMTHREADS=12

[-0.000497 modularity] noop

[00636.194 ms; 0025 iterations; 009 passes; 0.923383 modularity] louvainSeq

[00698.879 ms; 0026 iterations; 009 passes; 0.923119 modularity] louvainOmp {schkind: static, chunksize: 1}

[00655.290 ms; 0025 iterations; 009 passes; 0.922996 modularity] louvainOmp {schkind: static, chunksize: 2}

[00661.412 ms; 0026 iterations; 009 passes; 0.923447 modularity] louvainOmp {schkind: static, chunksize: 4}

...

[00617.610 ms; 0025 iterations; 009 passes; 0.927243 modularity] louvainOmp {schkind: auto, chunksize: 16384}

[00620.235 ms; 0025 iterations; 009 passes; 0.923699 modularity] louvainOmp {schkind: auto, chunksize: 32768}

[00614.724 ms; 0023 iterations; 009 passes; 0.923273 modularity] louvainOmp {schkind: auto, chunksize: 65536}

Loading graph /home/subhajit/data/web-BerkStan.mtx ...

order: 685230 size: 7600595 [directed] {}

order: 685230 size: 13298940 [directed] {} (symmetricize)

OMPNUMTHREADS=12

[-0.000316 modularity] noop

[01208.700 ms; 0028 iterations; 009 passes; 0.935839 modularity] louvainSeq

[01747.514 ms; 0027 iterations; 009 passes; 0.938295 modularity] louvainOmp {schkind: static, chunksize: 1}

[01569.095 ms; 0025 iterations; 009 passes; 0.933566 modularity] louvainOmp {schkind: static, chunksize: 2}

[01541.691 ms; 0025 iterations; 009 passes; 0.938013 modularity] louvainOmp {schkind: static, chunksize: 4}

...

```

References

- C++
Published by wolfram77 over 3 years ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

Recent Releases of louvain-communities-openmp

louvain-communities-openmp - Design of OpenMP-based Parallel Louvain algorithm for community detection, that prevents internally disconnected communities

Code structure

References

louvain-communities-openmp - Design of OpenMP-based Louvain algorithm for community detection

Code structure

References

louvain-communities-openmp - Effect of adjusting schedule in OpenMP-based Louvain algorithm for community detection

Loading graph /home/subhajit/data/web-Stanford.mtx ...

order: 281903 size: 2312497 [directed] {}

order: 281903 size: 3985272 [directed] {} (symmetricize)

OMPNUMTHREADS=12

[-0.000497 modularity] noop

[00636.194 ms; 0025 iterations; 009 passes; 0.923383 modularity] louvainSeq

[00698.879 ms; 0026 iterations; 009 passes; 0.923119 modularity] louvainOmp {schkind: static, chunksize: 1}

[00655.290 ms; 0025 iterations; 009 passes; 0.922996 modularity] louvainOmp {schkind: static, chunksize: 2}

[00661.412 ms; 0026 iterations; 009 passes; 0.923447 modularity] louvainOmp {schkind: static, chunksize: 4}

...

[00617.610 ms; 0025 iterations; 009 passes; 0.927243 modularity] louvainOmp {schkind: auto, chunksize: 16384}

[00620.235 ms; 0025 iterations; 009 passes; 0.923699 modularity] louvainOmp {schkind: auto, chunksize: 32768}

[00614.724 ms; 0023 iterations; 009 passes; 0.923273 modularity] louvainOmp {schkind: auto, chunksize: 65536}

Loading graph /home/subhajit/data/web-BerkStan.mtx ...

order: 685230 size: 7600595 [directed] {}

order: 685230 size: 13298940 [directed] {} (symmetricize)

OMPNUMTHREADS=12

[-0.000316 modularity] noop

[01208.700 ms; 0028 iterations; 009 passes; 0.935839 modularity] louvainSeq

[01747.514 ms; 0027 iterations; 009 passes; 0.938295 modularity] louvainOmp {schkind: static, chunksize: 1}

[01569.095 ms; 0025 iterations; 009 passes; 0.933566 modularity] louvainOmp {schkind: static, chunksize: 2}

[01541.691 ms; 0025 iterations; 009 passes; 0.938013 modularity] louvainOmp {schkind: static, chunksize: 4}

...

References