pachash

Packed and Compressed Hash Tables

https://github.com/bytehamster/pachash

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Packed and Compressed Hash Tables

Basic Info
  • Host: GitHub
  • Owner: ByteHamster
  • License: gpl-3.0
  • Language: C++
  • Default Branch: main
  • Homepage:
  • Size: 954 KB
Statistics
  • Stars: 15
  • Watchers: 5
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created about 4 years ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

PaCHash

An object store for variable-sized objects, which has small internal-memory space usage and still guarantees a limited number of external IO operations. For a given parameter a, the internal memory space usage is 2 + log(a) bits per block. Queries for objects of size |x| take constant time and fetch 1 + |x|/B + 1/a blocks from external memory.

Plots preview

Library usage

Add the following to your CMakeLists.txt.

add_subdirectory(path/to/PaCHash) target_link_libraries(YourTarget PRIVATE PaCHash)

Building the examples

git clone --recursive git@github.com:ByteHamster/PaCHash.git mkdir PaCHash/build cd PaCHash/build cmake -DCMAKE_BUILD_TYPE=Release .. make -j

Benchmarks

Alternative methods are implemented for benchmarking. You can find the full benchmark scripts in an independent repository: https://github.com/ByteHamster/PaCHash-Experiments

| Method | External memory utilization | Internal memory usage | IOs per query | |-------------------|-----------------------------|-----------------------|-------------------| | PaCHash | 100% | ~6 bits/page | 1 (variable size) | | Separator Hashing | Up to 98%¹ | ~6 bits/page | 1 | | Cuckoo Hashing | Up to 98%¹ | Constant | 2 parallel |

¹ Depending on the input distribution. The methods are originally designed for fixed size objects. Adversarial input can therefore bring the utilization down to 51% or even make construction impossible.

License

This code is licensed under the GPLv3. If you use the project in an academic context or publication, please cite our paper:

@inproceedings{kurpicz2023pachash, author = {Florian Kurpicz and Hans{-}Peter Lehmann and Peter Sanders}, title = {PaCHash: Packed and Compressed Hash Tables}, booktitle = {{ALENEX}}, pages = {162--175}, publisher = {{SIAM}}, year = {2023}, doi = {10.1137/1.9781611977561.CH14} }

Owner

  • Login: ByteHamster
  • Kind: user
  • Location: Germany
  • Company: Karlsruhe Institute of Technology

I'm a PhD student at Karlsruhe Institute of Technology. In my freetime, I maintain AntennaPod and contribute to other projects like K-9 Mail and Baikal Server.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software in an academic context or publication, please cite it as below."
authors:
- family-names: "Kurpicz"
  given-names: "Florian"
  orcid: "https://orcid.org/0000-0002-2379-9455"
- family-names: "Lehmann"
  given-names: "Hans-Peter"
  orcid: "https://orcid.org/0000-0002-0474-1805"
- family-names: "Sanders"
  given-names: "Peter"
  orcid: "https://orcid.org/0000-0003-3330-9349"
title: "PaCHash: Packed and Compressed Hash Tables"
preferred-citation:
  type: conference-paper
  title: "PaCHash: Packed and Compressed Hash Tables"
  authors:
    - family-names: "Kurpicz"
      given-names: "Florian"
      orcid: "https://orcid.org/0000-0002-2379-9455"
    - family-names: "Lehmann"
      given-names: "Hans-Peter"
      orcid: "https://orcid.org/0000-0002-0474-1805"
    - family-names: "Sanders"
      given-names: "Peter"
      orcid: "https://orcid.org/0000-0003-3330-9349"
  doi: "10.1137/1.9781611977561.CH14"
  journal: "ALENEX"
  year: 2023

GitHub Events

Total
  • Watch event: 1
  • Push event: 2
  • Create event: 1
Last Year
  • Watch event: 1
  • Push event: 2
  • Create event: 1

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 316
  • Total Committers: 1
  • Avg Commits per committer: 316.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 2
  • Committers: 1
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
ByteHamster i****o@b****m 316
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/build.yml actions
  • actions/checkout v2 composite