copra-communities
Single-threaded CPU-based Community OVerlap PRopagation Algorithm (COPRA) for community detection.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, iop.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.8%) to scientific vocabulary
Repository
Single-threaded CPU-based Community OVerlap PRopagation Algorithm (COPRA) for community detection.
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Single-threaded CPU-based Community OVerlap PRopagation Algorithm (COPRA) for community detection.
This is an implementation of a label-propagation based community detection
algorithm called Community OVerlap PRopagation Algorithm (COPRA). Unlike
RAK, this algorithm uses multiple labels per vertex, with each label having an
associated belonging coefficient (which sums to 1). The algorithm is as follows.
- Each vertex initializes as its own community (belonging=1).
- Each iteration, a vertex collects labels from its neighborhood.
- I excludes a vertex's own labels, although not explicitly mentioned in paper.
- The collected labels are scaled by edge weights for weighted graph.
- Each vertex picks labels above a certain threshold.
- This threshold is inversely proportional to the max. number of labels.
- If all labels are below threshold, pick a random best label.
- I make a vertex join its own community if it has no labels (not mentioned).
- Selected labels are normalized such that belonging coefficient sums to 1.
- Repeat from 2 until convergence.
The authors of this paper mention to use a mimimum vertices per community count to detect convergence. However i do not find it to be helpful. I instead use a similar convergence condition as RAK, that is count the number of vertices that change thier best label. Once this count falls below a certain fraction (tolerance), i consider the algorithm to have converged. The authors also mention using a sychrounous version of the algorithm, where labels of each vertex is dependent only upon labels in previous iteration. However, i find asynchronous approach to be converge faster (labels of each vertex can be dependent upon labels in current iteration). As we focus of finding disjoint communities, i consider the best label of each vertex as the final result.
I vary the tolerance from 0.1 to 0.0001 (as with RAK), and adjust the max.
number of labels from 1 to 32.
On average, it seems that using a single label is best in terms of time as well
as modularity. This is particulary true in case of road networks, but not so in
case of other classes of graphs (web graphs, social networks, collaboration
networks). For example, web graphs such as web-Stanford and web-BerkStan achieve
best modularity with max. labels of 8, web-Google does best with max. labels of
32, and web-NotreDame does best with max. labels of 16. Max. labels of 4-16
would be a good choice for such graphs.
In addition it seems that on average, making the tolerance tighter than 0.01 has
no beneficial effect on modularity. However, tighter tolerance does not help
with graphs such as web-NotreDame, coAuthorsDBLP, and social networks. It seems
a tolerance of 0.01 would be a good choice on average.
Both RAK and COPRA approaches can obtain disconnected communities. This issue can be resolved by splitting such communities into separate communities in a post-processing step. I do not do that here. We may do it when comparing these approaches.
All outputs are saved in a gist and a small part of the output is listed here. Some charts are also included below, generated from sheets. The input data used for this experiment is available from the SuiteSparse Matrix Collection. This experiment was done with guidance from Prof. Kishore Kothapalli and Prof. Dip Sankar Banerjee.
```bash $ g++ -std=c++17 -O3 main.cxx $ ./a.out ~/data/web-Stanford.mtx $ ./a.out ~/data/web-BerkStan.mtx $ ...
Loading graph /home/subhajit/data/web-Stanford.mtx ...
order: 281903 size: 2312497 [directed] {}
order: 281903 size: 3985272 [directed] {} (symmetricize)
[-0.000497 modularity] noop
[00206.635 ms; 0003 iters.; 0.836576819 modularity] copraSeqStatic {labels=01, tolerance=1e-01}
[00248.339 ms; 0003 iters.; 0.838712633 modularity] copraSeqStatic {labels=02, tolerance=1e-01}
[00369.730 ms; 0003 iters.; 0.875163972 modularity] copraSeqStatic {labels=04, tolerance=1e-01}
[00550.753 ms; 0003 iters.; 0.878541648 modularity] copraSeqStatic {labels=08, tolerance=1e-01}
[00675.437 ms; 0003 iters.; 0.881162941 modularity] copraSeqStatic {labels=16, tolerance=1e-01}
[01130.822 ms; 0003 iters.; 0.866800904 modularity] copraSeqStatic {labels=32, tolerance=1e-01}
[00206.794 ms; 0003 iters.; 0.836576819 modularity] copraSeqStatic {labels=01, tolerance=5e-02}
[00243.418 ms; 0003 iters.; 0.838712633 modularity] copraSeqStatic {labels=02, tolerance=5e-02}
[00475.227 ms; 0004 iters.; 0.887068212 modularity] copraSeqStatic {labels=04, tolerance=5e-02}
[00834.529 ms; 0005 iters.; 0.892077446 modularity] copraSeqStatic {labels=08, tolerance=5e-02}
[01010.437 ms; 0005 iters.; 0.897701085 modularity] copraSeqStatic {labels=16, tolerance=5e-02}
[01706.960 ms; 0005 iters.; 0.885300398 modularity] copraSeqStatic {labels=32, tolerance=5e-02}
...
```
References
- Finding overlapping communities in networks by label propagation; Steve Gregory (2010)
- Near linear time algorithm to detect community structures in large-scale networks; Usha Nandini Raghavan et al. (2007)
- The University of Florida Sparse Matrix Collection; Timothy A. Davis et al. (2011)
- How to import VSCode keybindings into Visual Studio?
- Configure X11 Forwarding with PuTTY and Xming
- Installing snap on CentOS
Owner
- Name: puzzlef
- Login: puzzlef
- Kind: organization
- Website: https://puzzlef.github.io/
- Repositories: 10
- Profile: https://github.com/puzzlef
A summary of experiments.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Sahu
given-names: Subhajit
orcid: https://orcid.org/0000-0001-5140-6578
title: "ionicf/copra-communities-seq: Single-threaded CPU-based Community OVerlap PRopagation Algorithm (COPRA) for community detection"
version: 1.0.0
doi: 10.5281/zenodo.7538202
date-released: 2023-01-15
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0







