Recent Releases of rak-communities-openmp

rak-communities-openmp - Design of OpenMP-based Label Propagation Algorithm (LPA) algorithm, aka RAK, for community detection

Multi-threaded OpenMP-based Label Propagation Algorithm (LPA), aka RAK, for community detection.

In recent years, there has been an unprecedented increase in data collection, and their representation as graphs. Thus, there is a demand for efficient parallel algorithms designed to identify communities in large networks. The significance of the multicore/shared memory environment in this context is crucial, both due to its energy efficiency, and due to widespread availability of hardware with large memory capacities. Existing research on Label Propagation Algorithm (LPA) focuses on algorithmic optimizations and parallelization but lacks exploration of efficient data structures for selecting the most weighted label. Furthermore, the proposed techniques are dispersed across multiple papers, making it challenging for readers to grasp and implement them effectively. To address this, we introduce GVE-LPA, an optimized parallel implementation of LPA for shared memory multicores.

Below we plot the time taken by FLPA, igraph LPA, NetworKit LPA, and GVE-LPA on 13 different graphs. GVE-LPA surpasses FLPA, igraph LPA, and NetworKit LPA by 118,000×, 97,000×, and 40× respectively, achieving a processing rate of 1.4𝐵 edges/s on a 3.8𝐵 edge graph.

Below we plot the speedup of GVE-LPA wrt FLPA, igraph LPA, and NetworKit LPA.

Next, we plot the modularity of communities identified by FLPA, igraph LPA, NetworKit LPA, and GVE-LPA. GVE-LPA on average obtains 0.6% / 0.2% higher modularity than FLPA and igraph LPA respectively (especially on web graphs), and 4.1% lower modularity than NetworKit LPA (especially on protein k-mer graphs with large number of vertices and a low average degree).

Finally, we plot the strong scaling behaviour of GVE-LPA. With doubling of threads, GVE-LPA exhibits an average performance scaling of 1.7×.

Refer to our technical report for more details: GVE-LPA: Fast Label Propagation Algorithm (LPA) for Community Detection in Shared Memory Setting.

[!NOTE] You can just copy main.sh to your system and run it. \ For the code, refer to main.cxx.

Code structure

The code structure of GVE-LPA is as follows:

bash - inc/_algorithm.hxx: Algorithm utility functions - inc/_bitset.hxx: Bitset manipulation functions - inc/_cmath.hxx: Math functions - inc/_ctypes.hxx: Data type utility functions - inc/_cuda.hxx: CUDA utility functions - inc/_debug.hxx: Debugging macros (LOG, ASSERT, ...) - inc/_iostream.hxx: Input/output stream functions - inc/_iterator.hxx: Iterator utility functions - inc/_main.hxx: Main program header - inc/_mpi.hxx: MPI (Message Passing Interface) utility functions - inc/_openmp.hxx: OpenMP utility functions - inc/_queue.hxx: Queue utility functions - inc/_random.hxx: Random number generation functions - inc/_string.hxx: String utility functions - inc/_utility.hxx: Runtime measurement functions - inc/_vector.hxx: Vector utility functions - inc/batch.hxx: Batch update generation functions - inc/bfs.hxx: Breadth-first search algorithms - inc/csr.hxx: Compressed Sparse Row (CSR) data structure functions - inc/dfs.hxx: Depth-first search algorithms - inc/duplicate.hxx: Graph duplicating functions - inc/Graph.hxx: Graph data structure functions - inc/rak.hxx: LPA/RAK community detection algorithm functions - inc/main.hxx: Main header - inc/mtx.hxx: Graph file reading functions - inc/properties.hxx: Graph Property functions - inc/selfLoop.hxx: Graph Self-looping functions - inc/symmetricize.hxx: Graph Symmetricization functions - inc/transpose.hxx: Graph transpose functions - inc/update.hxx: Update functions - main.cxx: Experimentation code - process.js: Node.js script for processing output logs

Note that each branch in this repository contains code for a specific experiment. The main branch contains code for the final experiment. If the intention of a branch in unclear, or if you have comments on our technical report, feel free to open an issue.

References

- C++
Published by wolfram77 over 2 years ago

rak-communities-openmp - Multi-threaded OpenMP-based Raghavan Albert Kumara (RAK) algorithm for community detection

Multi-threaded OpenMP-based Raghavan Albert Kumara (RAK) algorithm for community detection.

This is an implementation of a popular label-propagation based community detection algorithm called Raghavan Albert Kumara (RAK). Here, every node is initialized with a unique label and at every step each node adopts the label that most of its neighbors currently have. In this iterative process densely connected groups of nodes form a consensus on a unique label to form communities.

When there exist multiple communities with max weight, we randomly pick one of them (non-strict approach), or pick only the first of them (strict approach). The algorithm converges when n% of vertices dont change their community membership (tolerance).

We continue with OpenMP implementation of RAK algorithm for community detection. Each thread is given a separate hashtable, which it can use for choosing the most popular label among its neighbors (by weight). I initially packed the hashtables (for each thread) contiguously on a vector. However, i observed that allocating them separately given almost 2x performance (by using pointer to vectors). OpenMP schedule is auto now, we can later choose the best if we need.

Similar to previous experiment, we adjust tolerance from 0.1 to 0.0001 and compare the sequential and OpenMP implementations (non-strict, strict approaches).

On average, OpenMP-based strict RAK appears to be better, both in terms of time and modularity (2X with 1->). Also we generally see it do better than sequential approaches (possibly due to better randomization). For a tolerance of 0.05, OpenMP-based strict RAK (using 12 threads) is 6.75x faster than sequential non-strict RAK.

However, OpenMP implementations do not achieve better quality for coAuthors* graphs. For social networks, OpenMP-based non-strict RAK achieves better modularity than the strict version. Please see previous experiment for difference between strict and non-strict versions. Given below is modularity on soc-LiveJournal1 graph.

As brefore, it seems to me a tolerance of 0.05 would be a good choice.

All outputs are saved in a gist and a small part of the output is listed here. Some charts are also included below, generated from sheets. The input data used for this experiment is available from the SuiteSparse Matrix Collection. This experiment was done with guidance from Prof. Kishore Kothapalli and Prof. Dip Sankar Banerjee.

```bash $ g++ -std=c++17 -O3 main.cxx $ ./a.out ~/data/web-Stanford.mtx $ ./a.out ~/data/web-BerkStan.mtx $ ...

Loading graph /home/subhajit/data/web-Stanford.mtx ...

order: 281903 size: 2312497 [directed] {}

order: 281903 size: 3985272 [directed] {} (symmetricize)

OMPNUMTHREADS=12

[-0.000497 modularity] noop

[00150.956 ms; 0003 iters.; 0.872759998 modularity] rakSeqStatic {tolerance=1e-01}

[00152.820 ms; 0003 iters.; 0.847129285 modularity] rakSeqStaticStrict {tolerance=1e-01}

[00021.234 ms; 0003 iters.; 0.872025967 modularity] rakOmpStatic {tolerance=1e-01}

[00021.151 ms; 0003 iters.; 0.847373843 modularity] rakOmpStaticStrict {tolerance=1e-01}

[00149.791 ms; 0003 iters.; 0.872759998 modularity] rakSeqStatic {tolerance=5e-02}

[00147.402 ms; 0003 iters.; 0.847129285 modularity] rakSeqStaticStrict {tolerance=5e-02}

[00020.572 ms; 0003 iters.; 0.871711493 modularity] rakOmpStatic {tolerance=5e-02}

[00020.510 ms; 0003 iters.; 0.847401798 modularity] rakOmpStaticStrict {tolerance=5e-02}

[00236.676 ms; 0005 iters.; 0.876379907 modularity] rakSeqStatic {tolerance=1e-02}

[00234.186 ms; 0005 iters.; 0.853610516 modularity] rakSeqStaticStrict {tolerance=1e-02}

[00030.638 ms; 0005 iters.; 0.875127017 modularity] rakOmpStatic {tolerance=1e-02}

[00030.697 ms; 0005 iters.; 0.852154553 modularity] rakOmpStaticStrict {tolerance=1e-02}

[00284.453 ms; 0006 iters.; 0.877100408 modularity] rakSeqStatic {tolerance=5e-03}

[00277.874 ms; 0006 iters.; 0.854387999 modularity] rakSeqStaticStrict {tolerance=5e-03}

[00035.626 ms; 0006 iters.; 0.876359046 modularity] rakOmpStatic {tolerance=5e-03}

[00035.454 ms; 0006 iters.; 0.853212833 modularity] rakOmpStaticStrict {tolerance=5e-03}

[00366.719 ms; 0008 iters.; 0.877606571 modularity] rakSeqStatic {tolerance=1e-03}

[00407.965 ms; 0009 iters.; 0.855403066 modularity] rakSeqStaticStrict {tolerance=1e-03}

[00049.822 ms; 0009 iters.; 0.876784205 modularity] rakOmpStatic {tolerance=1e-03}

[00055.832 ms; 0010 iters.; 0.853845000 modularity] rakOmpStaticStrict {tolerance=1e-03}

[00448.478 ms; 0010 iters.; 0.877725899 modularity] rakSeqStatic {tolerance=5e-04}

[00575.903 ms; 0013 iters.; 0.856024683 modularity] rakSeqStaticStrict {tolerance=5e-04}

[00064.744 ms; 0011 iters.; 0.876972854 modularity] rakOmpStatic {tolerance=5e-04}

[00069.341 ms; 0013 iters.; 0.854331434 modularity] rakOmpStaticStrict {tolerance=5e-04}

[00616.509 ms; 0014 iters.; 0.877830148 modularity] rakSeqStatic {tolerance=1e-04}

[00742.345 ms; 0017 iters.; 0.856140256 modularity] rakSeqStaticStrict {tolerance=1e-04}

[00102.798 ms; 0020 iters.; 0.877253771 modularity] rakOmpStatic {tolerance=1e-04}

[00098.093 ms; 0019 iters.; 0.854501545 modularity] rakOmpStaticStrict {tolerance=1e-04}

...

```

References

- C++
Published by wolfram77 over 3 years ago