slpa-communities

Single-threaded CPU-based Speaker-listener Label Propagation Algorithm (SLPA) for community detection.

https://github.com/puzzlef/slpa-communities

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, springer.com, iop.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Single-threaded CPU-based Speaker-listener Label Propagation Algorithm (SLPA) for community detection.

Basic Info
  • Host: GitHub
  • Owner: puzzlef
  • License: mit
  • Language: C++
  • Default Branch: main
  • Homepage:
  • Size: 70.3 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created over 3 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

Single-threaded CPU-based Speaker-listener Label Propagation Algorithm (SLPA) for community detection.

I was trying out the Speaker-listener Label Propagation Algorithm (SLPA) which i had implemented yesterday (results out today). This is one of the algorithms given in literature review of LabelRank. Here each node has a fixed memory where it remembers all the popular labels that it listened. The size of this memory is fixed to the total number of iterations to be performed + 1. The algorithm is as follows.

  1. Each vertex is initialized such that it remembers itself as popular.
  2. Each neighbor speaks one of the random labels in its memory.
  3. The vertex (listener) adds the most popular label to its memory.
  4. Repeat from 2 until a fixed number of iterations is performed (labels - 1).
  5. I allow early convergence if at least n% of vertices remember their previous label.
  6. For each vertex, i pick the most popular label in its memory as its community.

Unlike RAK (also called LPA) and COPRA, this is a randomized historical-label based technique. A random number generator (RNG) is used by each neighbor of a vertex to pick a random label to speak from its memory. I originally used C++'s default random engine (which is probably mersenne twister) which seems slow. So i updated it to use a xorshift32 random engine. The listener vertex listens to all its neigbors, and based on edge weight, picks the most popular label. In strict mode, the listener pick the first most popular label as to record in its memory, and in non-strict mode, the listener pick randomly one of the most popular labels to record. It records this in the next index in its memory. The authors of this paper suggest to perform a fixed number of iterations (which is one less than the space we have reserved for storing the labels in the memory of each vertex). However, if the latest popular label (for a vertex) is same as the previous one in its memory, i consider it as a sign of convergence. So, when n% of vertices show this sign of convergence (tolerance), i perform an early exit. By default, a tolerance of 0.05 is used. As we want to find disjoint communities, i use the most popular label in the memory of each vertex as its final community.



The experiment is run with labels (memory space of each vertex) ranging from 4 to 64.



It appears that increasing the number of labels increases the time required for completion (as expected), and also increases the final modularity. However, on graphs soc-Slashdot* modularity seems to decrease with increasing labels.



It is also observed that the strict variant is slightly faster than the non-strict one, but achieves similar modularity. On road networks it seems strict approach obtains lower modularity, and on other types of graphs, strict approach achieves higher modularity.



However, its seems that this method of community detection does not provide a good return on investment (it takes longer but does not achieve great modularity). I do not perform any post-processing of communities as of now (such as splitting disconnected communities).

All outputs are saved in a gist and a small part of the output is listed here. Some charts are also included below, generated from sheets. The input data used for this experiment is available from the SuiteSparse Matrix Collection. This experiment was done with guidance from Prof. Kishore Kothapalli and Prof. Dip Sankar Banerjee.


```bash $ g++ -std=c++17 -O3 main.cxx $ ./a.out ~/data/web-Stanford.mtx $ ./a.out ~/data/web-BerkStan.mtx $ ...

Loading graph /home/subhajit/data/web-Stanford.mtx ...

order: 281903 size: 2312497 [directed] {}

order: 281903 size: 3985272 [directed] {} (symmetricize)

[-0.000497 modularity] noop

[00337.151 ms; 0004 iters.; 0.474440932 modularity] slpaSeqStatic {labels=04, tolerance=5e-02}

[00304.351 ms; 0004 iters.; 0.479095578 modularity] slpaSeqStaticStrict {labels=04, tolerance=5e-02}

[00901.554 ms; 0008 iters.; 0.752845287 modularity] slpaSeqStatic {labels=08, tolerance=5e-02}

[00866.540 ms; 0008 iters.; 0.679729104 modularity] slpaSeqStaticStrict {labels=08, tolerance=5e-02}

[02229.548 ms; 0016 iters.; 0.826291323 modularity] slpaSeqStatic {labels=16, tolerance=5e-02}

[02193.593 ms; 0016 iters.; 0.789462328 modularity] slpaSeqStaticStrict {labels=16, tolerance=5e-02}

[03706.551 ms; 0032 iters.; 0.847930729 modularity] slpaSeqStatic {labels=32, tolerance=5e-02}

[03655.065 ms; 0032 iters.; 0.817599833 modularity] slpaSeqStaticStrict {labels=32, tolerance=5e-02}

[07928.522 ms; 0064 iters.; 0.857062638 modularity] slpaSeqStatic {labels=64, tolerance=5e-02}

[07843.098 ms; 0064 iters.; 0.826945186 modularity] slpaSeqStaticStrict {labels=64, tolerance=5e-02}

...

```



References




ORG DOI

Owner

  • Name: puzzlef
  • Login: puzzlef
  • Kind: organization

A summary of experiments.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Sahu
    given-names: Subhajit
    orcid: https://orcid.org/0000-0001-5140-6578
title: "ionicf/slpa-communities-seq: Single-threaded CPU-based Speaker-listener Label Propagation Algorithm (SLPA) for community detection"
version: 1.0.0
doi: 10.5281/zenodo.7538558
date-released: 2023-01-15

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels