bit_vector

Fast and highly tuned bit vector implementation including space efficient rank and select support having only 3.51% space overhead.

https://github.com/pasta-toolbox/bit_vector

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.0%) to scientific vocabulary

Keywords

bitvector cpp rank select space-efficiency
Last synced: 6 months ago · JSON representation ·

Repository

Fast and highly tuned bit vector implementation including space efficient rank and select support having only 3.51% space overhead.

Basic Info
Statistics
  • Stars: 29
  • Watchers: 4
  • Forks: 5
  • Open Issues: 1
  • Releases: 2
Topics
bitvector cpp rank select space-efficiency
Created over 4 years ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

pasta::bit_vector

License: GPL v3 DOI pasta::bit_vector CI codecov

This header-only library contains a highly tuned (uncompressed) bit vector implementation with multiple space efficient rank and select support data structures. Our fastest rank and select support has a space overhead of only ~3.51% and makes use of data level parallelism via SIMD instructions.

If you use this code in a scientific context, please cite our paper. bibtex @inproceedings{Kurpicz2022CompactRankSelect, author = {Florian Kurpicz}, title = {Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors}, booktitle = {{SPIRE}}, series = {Lecture Notes in Computer Science}, volume = {13617}, pages = {257--272}, publisher = {Springer}, year = {2022}, doi = {10.1007/978-3-031-20643-6\_19} }

Contents

This repository contains the following algorithms and data structures. Our documentation contains in-depth on the usage of all these algorithms and data structures including easy to follow examples. You can find the example in the screenshot below as text, too.

Screenshot Documentation

Bit Vectors

Bit vectors play an important role in many compressed text indices, e.g., the FM-index. This repository contains the following bit vector implementations:

Dong Zhou and David G. Andersen and Michael Kaminsky, Space-Efficient, High-Performance Rank and Select Structures on Uncompressed Bit Sequences, SEA 2013.

  • improved rank and select support requiring the same amount of memory but providing faster rank (up to 8% speedup) and select (up to 16.5% speedup) queries, and
  • a very fast rank support that can also answer select queries.

Easy to Use

Since this is a header-only library, you have to simply add it to your projects include path to use it. A small example can be found below. We refer to the documentation for more information.

```cpp #include #include

// Create a bit vector of size 1000 containing only zeros and flip every other bit. pasta::BitVector bv(1000, 0); for (size_t i = 0; i < bv.size(); ++i) { if (i % 2 == 0) { bv[i] = 1; } } // Print the bit vector to see that everything worked ;-) for (auto it = bv.begin(); it != bv.end(); ++it) { std::cout << ((*it == true) ? '1' : '0'); } std::cout << std::endl;

// Create rank and select support and print the result of some queries. pasta::FlatRankSelect rs(bv); std::cout << rs.rank0(250) << ", " << rs.rank1(250) << ", " << rs.select0(250) << ", " << rs.rank1(250) << std::endl; ```

Benchmarks and Tests

There exist an easy to use benchmark, which helps to compare the implementations in this repository. To build the benchmark, run the CMake command with -DPASTA_BIT_VECTOR_BUILD_BENCHMARKS=On. Our tests are contained in the folder tests. To build the tests, run the CMake command with -DPASTA_BIT_VECTOR_BUILD_TESTS=On.

We also conducted an extensive experimental evaluation. To this end, we use our rank and select benchmark where we compare our implementations with many other compact rank and select data structures.

We refer to our paper for a full description of the results, i.e., hardware, inputs, and competitors. Below, you can find some of the figures we present in the paper.

Screenshot Documentation

Screenshot Documentation

Screenshot Documentation

Screenshot Documentation

How to Get This

Below, we list all commands that are required to build the code in this repository. To this end, we provide three CMake presets (debug, release, and release with debug information).

  • The debug preset creates a debug folder and uses the compiler flags -DDEBUG -O0 -g -ggdb -fsanitize=address.
  • The release preset creates a build folder and uses the compiler flags -DNDEBUG -march=native -O3.
  • The release with debug information preset creates a build_with_debug_info folder and uses the compiler flags -DDEBUG -g -march=native -O3.

Per default, we use the following compiler flags: -Wall -Wextra -Wpedantic -fdiagnostics-color=always.

Requirements

pasta::bit_vector is written in C++20 and requires a compiler that supports it. We use Ninja as build system. For more information on how to use this library, please refer to our documentation.

tl;dr

To just clone the source code, use the following. bash git clone git@github.com:pasta-toolbox/bit_vector cd bit_vector git submodule update --init --recursive If you also want to build the test, please continue with the following commands. bash cmake --preset=[debug|build|relwithdeb]-DPASTA_BIT_VECTOR_BUILD_TESTS=On cmake --build --preset=[debug|release|relwithdeb] ctest --test-dir [debug|build|relwithdeb]

Owner

  • Name: PASTA-Toolbox
  • Login: pasta-toolbox
  • Kind: organization
  • Location: Germany

PArallel STring Algorithms - A Toolbox Full of Scalable Algorithms and Data Structures for Strings

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Kurpicz
    given-names: Florian
    orcid: https://orcid.org/0000-0002-2379-9455
title: "pasta::bit_vector"
version: 1.0.1
identifiers:
  - type: doi
    value: https://doi.org/10.5281/zenodo.7446380
date-released: 2022-12-22
url: "https://github.com/pasta-toolbox/bit_vector"
preferred-citation:
  type: proceedings
  title: "Engineering Compact Data Structures for Rank and Select Queries on Bit Vectors"
  authors:
    - family-names: Kurpicz
      given-names: Florian
  publisher:
    name: "Springer"
  doi: "https://doi.org/10.1007/978-3-031-20643-6_19"
  year: 2022

GitHub Events

Total
  • Watch event: 4
  • Push event: 8
  • Pull request review event: 2
  • Pull request event: 8
  • Fork event: 2
Last Year
  • Watch event: 4
  • Push event: 8
  • Pull request review event: 2
  • Pull request event: 8
  • Fork event: 2

Dependencies

.github/workflows/clang-format-check.yml actions
  • actions/checkout v2 composite
  • jidicula/clang-format-action v4.4.1 composite
.github/workflows/ctest.yml actions
  • actions/checkout v2 composite
  • codecov/codecov-action v2 composite
.github/workflows/doxygen.yml actions
  • actions/checkout v3 composite
  • mattnotmitt/doxygen-action v1.9.4 composite
  • peaceiris/actions-gh-pages v3 composite
.github/workflows/Dockerfile docker
  • ubuntu latest build