Recent Releases of pagerank-openmp-dynamic

pagerank-openmp-dynamic - Design of OpenMP-based Parallel Dynamic PageRank algorithm for measuring importance

Design of OpenMP-based Parallel Dynamic PageRank algorithm for measuring importance.

PageRank serves as an algorithm assessing the significance of nodes within a network through the assignment of numerical scores based on link structure. Its applications span web page ranking, identification of misinformation, traffic flow prediction, and protein target identification. The growing availability of extensive graph-based data has spurred interest in parallel algorithms for PageRank computation.

In real-world scenarios, graphs often evolve over time, with frequent edge insertions and deletions rendering recomputation of PageRank from scratch impractical, especially for small, rapid changes. Existing strategies optimize by iterating from the previous snapshot's ranks, reducing the required iterations for convergence. To further enhance efficiency, it becomes crucial to recalibrate the ranks of only those vertices likely to undergo changes. A common approach entails identifying reachable vertices from the updated graph regions and limiting processing to such vertices. However, if updates are randomly distributed, they may frequently fall within dense graph regions, necessitating the processing of a substantial portion of the graph.

To mitigate computational effort, one can incrementally expand the set of affected vertices from the updated graph region, rather than processing all reachable vertices from the initial iteration. Moreover, it is possible to skip processing a vertex's neighbors if the change in its rank is small and expected to have minimal impact on the ranks of neighboring vertices. Here, we introduce the Dynamic Frontier (DF) anf Dynamic Frontier with Pruning (DF-P) approaches for Updating PageRank, which addresses these considerations.


Below we plot the average time taken by Static, Naive-dynamic (ND), Dynamic Traversal (DT), our improved Dynamic Frontier (DF), and Dynamic Frontier with Pruning (DF-P) PageRank on 5 real-world dynamic graphs, with batch updates of size 10^-5|Eᴛ| to 10^-3|Eᴛ|. The labels indicate the speedup of each approach with respect to Static PageRank. DF PageRank is, on average, 8.0×, 4.5×, and 3.2× faster than Static PageRank, and is, on average, 1.3×, 1.1×, and 1.5× faster than DT PageRank, a widely used approach for updating PageRank on dynamic graphs. In contrast, DF-P PageRank is, on average, 26.2×, 11.9×, and 7.5× faster than Static PageRank and, on average, 4.2×, 2.8×, and 3.6× faster than DT PageRank on identical batch updates.

Next, we plot the Error comparison of Static, ND, DT, DF, and DF-P PageRank with respect to a Reference Static PageRank (with a tolerance 𝜏 of 10^−100 and limited to 500 iterations), using L1-norm.

Finally, we plot the strong scaling behaviour of DF and DF-P PageRank. With doubling of threads, DF PageRank exhibits an average performance scaling of 1.8×, while DF-P PageRank exhibits an average performance scaling of 1.7×.

Refer to our technical reports for more details: \ An Incrementally Expanding Approach for Updating PageRank on Dynamic Graphs. \ DF* PageRank: Improved Incrementally Expanding Approaches for Updating PageRank on Dynamic Graphs.


[!NOTE] You can just copy main.sh to your system and run it. \ For the code, refer to main.cxx.



Code structure

The code structure of Dynamic Frontier (DF), and Dynamic Frontier with Pruning (DF-P) PageRank is as follows:

bash - inc/_algorithm.hxx: Algorithm utility functions - inc/_bitset.hxx: Bitset manipulation functions - inc/_cmath.hxx: Math functions - inc/_ctypes.hxx: Data type utility functions - inc/_cuda.hxx: CUDA utility functions - inc/_debug.hxx: Debugging macros (LOG, ASSERT, ...) - inc/_iostream.hxx: Input/output stream functions - inc/_iterator.hxx: Iterator utility functions - inc/_main.hxx: Main program header - inc/_mpi.hxx: MPI (Message Passing Interface) utility functions - inc/_openmp.hxx: OpenMP utility functions - inc/_queue.hxx: Queue utility functions - inc/_random.hxx: Random number generation functions - inc/_string.hxx: String utility functions - inc/_utility.hxx: Runtime measurement functions - inc/_vector.hxx: Vector utility functions - inc/batch.hxx: Batch update generation functions - inc/bfs.hxx: Breadth-first search algorithms - inc/csr.hxx: Compressed Sparse Row (CSR) data structure functions - inc/dfs.hxx: Depth-first search algorithms - inc/duplicate.hxx: Graph duplicating functions - inc/Graph.hxx: Graph data structure functions - inc/main.hxx: Main header - inc/mtx.hxx: Graph file reading functions - inc/pagerank.hxx: PageRank algorithms - inc/pagerankPrune.hxx: Dynamic Frontier with Pruning PageRank algorithms - inc/properties.hxx: Graph Property functions - inc/selfLoop.hxx: Graph Self-looping functions - inc/symmetricize.hxx: Graph Symmetricization functions - inc/transpose.hxx: Graph transpose functions - inc/update.hxx: Update functions - main.cxx: Experimentation code - process.js: Node.js script for processing output logs

Note that each branch in this repository contains code for a specific experiment. The main branch contains code for the final experiment. If the intention of a branch in unclear, or if you have comments on our technical report, feel free to open an issue.



References




ORG

- C++
Published by wolfram77 over 2 years ago

pagerank-openmp-dynamic - Performance of static vs dynamic OpenMP-based PageRank algorithm for link analysis

Performance of static vs dynamic OpenMP-based PageRank algorithm for link analysis.

Dynamic graphs, which change with time, have many applications. Computing ranks of vertices from scratch on every update (static PageRank) may not be good enough for an interactive system. In such cases, we only want to process ranks of vertices which are likely to have changed. To handle any new vertices added/removed, we first adjust the previous ranks (before the graph update/batch) with a scaled 1/N-fill approach (1). Then, with naive dynamic approach we simply run the PageRank algorithm with the initial ranks set to the adjusted ranks. Alternatively, with the (fully) dynamic approach we first obtain a subset of vertices in the graph which are likely to be affected by the update (using BFS/DFS from changed vertices), and then perform PageRank computation on only this subset of vertices.

In this experiment, we compare the performance of static, naive dynamic, and (fully) dynamic OpenMP-based PageRank (along with similar sequential approaches). We take temporal graphs as input, and add edges to our in-memory graph in batches of size 10^2 to 10^6. However we do not perform this on every point on the temporal graph, but only on 5 time samples of the graph (5 samples are good enough to obtain an average). At each time sample we load B edges (where B is the batch size), and perform static, naive dynamic, and dynamic PageRank. At each time sample, each approach is performed 5 times to obtain an average time for that sample. A schedule of dynamic, 2048 is used for OpenMP-based PageRank as obtained in (2). We use the follwing PageRank parameters: damping factor α = 0.85, tolerance τ = 10^-6, and limit the maximum number of iterations to L = 500. The error between the current and the previous iteration is obtained with L1-norm, and is used to detect convergence. Dead ends in the graph are handled by always teleporting any vertex in the graph at random (teleport approach (3)). Error in ranks obtained for each approach is measured relative to the sequential static approach using L1-norm.

From the results, we observe that naive dynamic and dynamic PageRank are significantly faster than static PageRank for small batch sizes. However as the batch size increases, the gap between static and the two dynamic approaches decreases (as one would expect). However, interestingly we note that there seems to be little to no difference between naive dynamic and dynamic PageRank. As dynamic PageRank has an added cost of finding the subset of vertices which might be affected (for which time taken is not considered here), it seems that using naive dynamic PageRank is a better option (which is also easier to implement).

All outputs are saved in a gist and a small part of the output is listed here. Some charts are also included below, generated from sheets. The input data used for this experiment is available from the SuiteSparse Matrix Collection. This experiment was done with guidance from Prof. Kishore Kothapalli, Prof. Dip Sankar Banerjee, and Prof. Sathya Peri.


```bash $ g++ -std=c++17 -O3 -fopenmp main.cxx $ ./a.out ~/data/email-Eu-core-temporal.txt $ ./a.out ~/data/CollegeMsg.txt $ ...

Using graph /home/subhajit/data/email-Eu-core-temporal.txt ...

OMPNUMTHREADS=12

Temporal edges: 332335

# Batch size 1e+02

[751 order; 6952 size; 00000.446 ms; 046 iters.] [0.0000e+00 err.] pagerankSeqStatic

[751 order; 6952 size; 00000.470 ms; 046 iters.] [0.0000e+00 err.] pagerankOmpStatic

[751 order; 6952 size; 00000.156 ms; 015 iters.] [4.9179e-06 err.] pagerankSeqNaiveDynamic

[751 order; 6952 size; 00000.161 ms; 015 iters.] [4.9179e-06 err.] pagerankOmpNaiveDynamic

[751 order; 6952 size; 00000.162 ms; 015 iters.] [4.9179e-06 err.] pagerankSeqDynamic

[751 order; 6952 size; 00000.162 ms; 015 iters.] [4.9179e-06 err.] pagerankOmpDynamic

[802 order; 10532 size; 00000.399 ms; 028 iters.] [0.0000e+00 err.] pagerankSeqStatic

[802 order; 10532 size; 00000.412 ms; 028 iters.] [0.0000e+00 err.] pagerankOmpStatic

[802 order; 10532 size; 00000.245 ms; 016 iters.] [2.9494e-06 err.] pagerankSeqNaiveDynamic

[802 order; 10532 size; 00000.239 ms; 016 iters.] [2.9494e-06 err.] pagerankOmpNaiveDynamic

[802 order; 10532 size; 00000.249 ms; 016 iters.] [2.9494e-06 err.] pagerankSeqDynamic

[802 order; 10532 size; 00000.246 ms; 016 iters.] [2.9494e-06 err.] pagerankOmpDynamic

...

[986 order; 24929 size; 00000.746 ms; 023 iters.] [0.0000e+00 err.] pagerankSeqStatic

[986 order; 24929 size; 00000.719 ms; 023 iters.] [0.0000e+00 err.] pagerankOmpStatic

[986 order; 24929 size; 00000.356 ms; 011 iters.] [3.1348e-06 err.] pagerankSeqNaiveDynamic

[986 order; 24929 size; 00000.360 ms; 011 iters.] [3.1348e-06 err.] pagerankOmpNaiveDynamic

[986 order; 24929 size; 00000.358 ms; 011 iters.] [3.1348e-06 err.] pagerankSeqDynamic

[986 order; 24929 size; 00000.346 ms; 011 iters.] [3.1348e-06 err.] pagerankOmpDynamic

# Batch size 1e+03

[751 order; 6952 size; 00000.456 ms; 046 iters.] [0.0000e+00 err.] pagerankSeqStatic

[751 order; 6952 size; 00000.473 ms; 046 iters.] [0.0000e+00 err.] pagerankOmpStatic

[751 order; 6952 size; 00000.244 ms; 023 iters.] [8.9126e-06 err.] pagerankSeqNaiveDynamic

[751 order; 6952 size; 00000.243 ms; 023 iters.] [8.9126e-06 err.] pagerankOmpNaiveDynamic

[751 order; 6952 size; 00000.253 ms; 023 iters.] [8.9126e-06 err.] pagerankSeqDynamic

[751 order; 6952 size; 00000.236 ms; 023 iters.] [8.9126e-06 err.] pagerankOmpDynamic

[802 order; 10532 size; 00000.396 ms; 028 iters.] [0.0000e+00 err.] pagerankSeqStatic

[802 order; 10532 size; 00000.416 ms; 028 iters.] [0.0000e+00 err.] pagerankOmpStatic

[802 order; 10532 size; 00000.293 ms; 020 iters.] [8.0420e-07 err.] pagerankSeqNaiveDynamic

[802 order; 10532 size; 00000.297 ms; 020 iters.] [8.0420e-07 err.] pagerankOmpNaiveDynamic

[802 order; 10532 size; 00000.299 ms; 020 iters.] [8.0420e-07 err.] pagerankSeqDynamic

[802 order; 10532 size; 00000.295 ms; 020 iters.] [8.0420e-07 err.] pagerankOmpDynamic

...

# Batch size 1e+06

[986 order; 24929 size; 00000.746 ms; 023 iters.] [0.0000e+00 err.] pagerankSeqStatic

[986 order; 24929 size; 00000.748 ms; 023 iters.] [0.0000e+00 err.] pagerankOmpStatic

[986 order; 24929 size; 00000.749 ms; 023 iters.] [0.0000e+00 err.] pagerankSeqNaiveDynamic

[986 order; 24929 size; 00000.743 ms; 023 iters.] [0.0000e+00 err.] pagerankOmpNaiveDynamic

[986 order; 24929 size; 00000.737 ms; 023 iters.] [0.0000e+00 err.] pagerankSeqDynamic

[986 order; 24929 size; 00000.710 ms; 023 iters.] [0.0000e+00 err.] pagerankOmpDynamic

Using graph /home/subhajit/data/CollegeMsg.txt ...

OMPNUMTHREADS=12

Temporal edges: 59836

# Batch size 1e+02

[564 order; 2335 size; 00000.207 ms; 043 iters.] [0.0000e+00 err.] pagerankSeqStatic

[564 order; 2335 size; 00000.194 ms; 043 iters.] [0.0000e+00 err.] pagerankOmpStatic

[564 order; 2335 size; 00000.236 ms; 023 iters.] [3.2353e-06 err.] pagerankSeqNaiveDynamic

[564 order; 2335 size; 00000.282 ms; 023 iters.] [3.2353e-06 err.] pagerankOmpNaiveDynamic

[564 order; 2335 size; 00000.260 ms; 023 iters.] [3.2353e-06 err.] pagerankSeqDynamic

[564 order; 2335 size; 00000.327 ms; 023 iters.] [3.2353e-06 err.] pagerankOmpDynamic

[792 order; 4452 size; 00000.364 ms; 048 iters.] [0.0000e+00 err.] pagerankSeqStatic

[792 order; 4452 size; 00000.374 ms; 048 iters.] [0.0000e+00 err.] pagerankOmpStatic

[792 order; 4452 size; 00000.178 ms; 023 iters.] [1.0701e-05 err.] pagerankSeqNaiveDynamic

[792 order; 4452 size; 00000.175 ms; 023 iters.] [1.0701e-05 err.] pagerankOmpNaiveDynamic

[792 order; 4452 size; 00000.235 ms; 023 iters.] [1.0701e-05 err.] pagerankSeqDynamic

[792 order; 4452 size; 00000.614 ms; 023 iters.] [1.0701e-05 err.] pagerankOmpDynamic

...

```



References




- C++
Published by wolfram77 over 3 years ago