hybrid-csr

Comparing space usage of regular vs hybrid CSR.

https://github.com/puzzlef/hybrid-csr

Science Score: 41.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.5%) to scientific vocabulary

Keywords

csr data graph hybrid regular space structure usage
Last synced: 6 months ago · JSON representation ·

Repository

Comparing space usage of regular vs hybrid CSR.

Basic Info
  • Host: GitHub
  • Owner: puzzlef
  • License: mit
  • Language: C++
  • Default Branch: main
  • Homepage:
  • Size: 67.4 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
csr data graph hybrid regular space structure usage
Created almost 5 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation

README.md

Comparing space usage of regular vs hybrid CSR.

We are comparing the space usage between: 1. 32bit Regular CSR. 2. 64bit Regular CSR. 3. 32bit Hybrid CSR with 4bit block, 28bit index (30 eff.). 4. 32bit Hybrid CSR with 8bit block, 24bit index (27 eff.). 5. 32bit Hybrid CSR with 16bit block, 16bit index (20 eff.). 6. 64bit Hybrid CSR with 4bit block, 60bit index (62 eff.). 7. 64bit Hybrid CSR with 8bit block, 56bit index (59 eff.). 8. 64bit Hybrid CSR with 16bit block, 48bit index (52 eff.). 9. 64bit Hybrid CSR with 32bit block, 32bit index (37 eff.).

We want to assess the size needed for graph representation for various possible hybrid CSR formats, by adjusting the size of dense bitset (block), and hence the index-bits. Both 32-bit and 64-bit hybrid CSR are studied, and compared with 32-bit regular CSR. A 32-bit regular CSR is represented using a uint32_t data type, and uses all 32 bits for vertex-index (index-bits). It can support graphs with a maximum of 2^32 vertices (or simply a 32-bit vertex-id). A 32-bit hybrid CSR is also represented using a uint32_t data type, where lower b bits are used to store the dense bitset (block), and upper i = 32-b bits to store the index-bits. It supports an effective vertex-id of i+log2(b) = 32-b+log2(b) bits. For each experiment, the size of block b is adjusted from 4 to 16 bits. Similarly, a 64-bit hybrid CSR is represented using uint64_t data type, where lower b bits are used to store the dense bitset (block) and upper i = 64-b bits to store the index-bits. Hence, the effective vertex-id supported is of i+log2(b) = 64-b+log2(b) bits. For this experiment, the size of block b is adjusted from 4 to 32 bits. For a given vertex-id v , index-bits are defined as v >> b , block-bits are defined as 1 << (v & ones(b)), and thus the hybrid CSR entry is (index-bits << b) | block-bits. Finding an edge-id in an edge-list involves scanning all entries with matching index-bits, and once matched, checking if the appropriate block-bit is set (for both hybrid CSRs). Since lowering the number of index-bits reduces the maximum possible order of graph representable by the format, the effective bits usable for vertex-id for each hybrid CSR variation is listed for reference. For each experiment, edge-ids for each graph are first loaded into a 32-bit array of arrays structure, and then converted to the desired CSR formats.

All graphs used are stored in the MatrixMarket (.mtx) file format, and obtained from the SuiteSparse Matrix Collection. The experiments are implemented in C++, and compiled using GCC 9 with optimization level 3 (-O3). The system used is a Dell PowerEdge R740 Rack server with two Intel Xeon Silver 4116 CPUs @ 2.10GHz, 128GB DIMM DDR4 Synchronous Registered (Buffered) 2666 MHz (8x16GB) DRAM, and running CentOS Linux release 7.9.2009 (Core). Statistics of each test case is printed to standard output (stdout), and redirected to a log file, which is then processed with a script to generate a CSV file, with each row representing the details of a single test case. This CSV file is imported into Google Sheets, and necessary tables are set up with the help of the FILTER function to create the charts.


With Default order

In this experiment (with-default-order), graphs are converted to each CSR format with vertices labeled in the default order (no reordering). It is observed that for a given n-bit hybrid CSR using the highest possible block size (taking into account effective index-bits) results in the smallest space usage. The 32-bit hybrid CSR with a 16-bit block is able to achieve a maximum space usage (bytes) reduction of ~5x, but is unable to represent all the graphs under test (it has a 20-bit effective vertex-id). With an 8-bit block the space usage is reduced by ~3x - 3.5x for coPapersCiteseer, coPapersDBLP, and indochina-2004. The 64-bit hybrid CSR with a 32-bit block is able to achieve a maximum space usage reduction of ~3.5x, but generally does not perform well. However, for massive graphs which can not be represented with a 32-bit vertex-id, it is likely to provide significant reduction in space usage. This can be gauged by comparing the number of destination-indices needed for each CSR variant, where it achieves a maximum destination-indices reduction of ~7x. This reduction is likely to be higher with graphs partitioned by hosts / heuristics / clustering algorithms which is usually necessary for massive graphs deployed in a distributed setting. This could be assessed in a future study.


Other experiments



References




ORG DOI

Owner

  • Name: puzzlef
  • Login: puzzlef
  • Kind: organization

A summary of experiments.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Sahu
    given-names: Subhajit
    orcid: https://orcid.org/0000-0001-5140-6578
title: "puzzlef/csr-regular-vs-hybrid: Comparing space usage of regular vs hybrid CSR (various sizes)"
version: 1.0.0
doi: 10.5281/zenodo.6717408
date-released: 2022-06-24

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels