gxhash

Unsafely fast hashing algorithm 📈

https://github.com/ogxd/gxhash

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • â—‹
    CITATION.cff file
  • ✓
    codemeta.json file
    Found codemeta.json file
  • ✓
    .zenodo.json file
    Found .zenodo.json file
  • â—‹
    DOI references
  • ✓
    Academic publication links
    Links to: zenodo.org
  • â—‹
    Academic email domains
  • â—‹
    Institutional organization owner
  • â—‹
    JOSS paper metadata
  • â—‹
    Scientific vocabulary similarity
    Low similarity (13.0%) to scientific vocabulary

Keywords

cryptography hash hashing hashmap ilp no-std performance simd
Last synced: 6 months ago · JSON representation

Repository

Unsafely fast hashing algorithm 📈

Basic Info
  • Host: GitHub
  • Owner: ogxd
  • License: mit
  • Language: Rust
  • Default Branch: main
  • Homepage: https://docs.rs/gxhash
  • Size: 12.3 MB
Statistics
  • Stars: 902
  • Watchers: 7
  • Forks: 33
  • Open Issues: 23
  • Releases: 1
Topics
cryptography hash hashing hashmap ilp no-std performance simd
Created over 2 years ago · Last pushed 10 months ago
Metadata Files
Readme Funding License Citation

README.md

GxHash

Build & Test Cross Compile Rust Version Compatibility

GxHash is a blazingly fast and robust non-cryptographic hashing algorithm.

Features | Considerations | Usage | Benchmarks | Contributing

Features

Blazingly Fast

Up to this date, GxHash is the fastest non-cryptographic hashing algorithm of its class, for all input sizes. This performance is possible mostly thanks to heavy usage of SIMD intrinsics, high ILP construction, a small bytecode (easily inlined and cached) and some (outrageously unsafe) tricks.

See the benchmarks.

Highly Robust

GxHash uses several rounds of hardware-accelerated AES block cipher for efficient bit mixing.
Thanks to this, GxHash passes all SMHasher tests, which is the de facto quality benchmark for non-cryptographic hash functions, gathering most of the existing algorithms. GxHash has low collisions, uniform distribution and high avalanche properties.

Check out the paper for more technical details.

0 Dependencies

GxHash has 0 cargo dependency. The Hasher and Hashset/Hashmap convenience types require the standard library, enabled by default with the std feature.

Important Considerations

Hardware Acceleration

GxHash requires a few specific hardware acceleration features, which are supported on most modern processors, but not all of them. - X86 processors with AES-NI & SSE2 intrinsics - ARM processors with AES & NEON intrinsics

Warning Other platforms are currently not supported (there is no fallback). GxHash will not build on these platforms.

In case you are building gxhash without the required features, the crate will fail to build with an error message like this (even if you know your target supports the required features): Gxhash requires aes and sse2 intrinsics. Make sure the processor supports it and build with RUSTFLAGS="-C target-cpu=native" or RUSTFLAGS="-C target-feature=+aes,+sse2"

To fix this, simply follow the instructions in the error message. Setting RUSTFLAGS to -C target-cpu=native should work if your CPU is properly recognized by rustc, which is the case most of the time.

Hashes Stability

All generated hashes for a given major version of GxHash are stable, meaning that for a given input the output hash will be the same across all supported platforms. This also means that the hash may change between majors versions (eg gxhash 2.x and 3.x).

Consistency of Hashes When Using the Hasher Trait

The Hasher trait defines methods to hash specific types. This allows the implementation to circumvent some tricks used when the size is unknown. For this reason, hashing 4 u32 using a Hasher will return a different hash compared to using the gxhash128 method directly with these same 4 u32 but represented as 16 u8. The rationale being that Hasher (mostly used for things like HashMap or HashSet) and gxhash128 are used in two different scenarios. Both way are independently stable still.

Unsafety

In order to achieve this magnitude of performance, this crate contains unsafe code as well as a rather aggressive optimization technique. For this reason, this crate is not intended for use in safety-critical applications, but rather for applications that require extreme hashing performance and that are less concerned about this aspect.

Security

GxHash is seeded (with seed randomization) to improve DOS resistance and uses a wide (128-bit) internal state to improve multicollision resistance. Yet, such resistances are just basic safeguards and do not make GxHash secure against all attacks.

For use cases that require deterministic repeatability, you can disable random seeding with the feature "deterministic," but this of course disables DOS mitigation.

Also, it is important to note that GxHash is not a cryptographic hash function and should not be used for cryptographic purposes.

Usage

bash cargo add gxhash Used directly as a hash function: rust let bytes: &[u8] = "hello world".as_bytes(); let seed = 1234; println!(" 32-bit hash: {:x}", gxhash::gxhash32(&bytes, seed)); println!(" 64-bit hash: {:x}", gxhash::gxhash64(&bytes, seed)); println!("128-bit hash: {:x}", gxhash::gxhash128(&bytes, seed));

GxHash provides an implementation of the Hasher trait. For convenience, this crate also provides the type aliases gxhash::HashMap and gxhash::HashSet.

```rust use gxhash::{HashMap, HashMapExt};

let mut map: HashMap<&str, i32> = HashMap::new(); map.insert("answer", 42); ```

Flags

no_std

The std feature flag enables the HashMap/HashSet container convenience type aliases. This is on by default. Disable to make the crate no_std:

toml [dependencies.gxhash] ... default-features = false

hybrid (experimental)

The hybrid feature flag enables a hybrid implementation of GxHash. This is disabled by default. When hybrid feature is enabled and for CPUs that supports it, GxHash will use wider registers and instructions (VAES + AVX2), which can lead to a throughput increase for large inputs. This preserves hashes stability, meaning that hashes generated with or without the hybrid feature are the same for a given input and seed.

Note: Even without this feature enabled GxHash is already the fastest option out there. We recommend enabling this feature only when inputs can be larger than a few hundred bytes. Make sure to run benchmarks in your own context.

Benchmarks

Benchmark
GxHash is continuously benchmarked on X86 and ARM GitHub runners.

Important: If performance if a critical feature for your application, don't forget to benchmark the cost of hashing in your own context. Numbers shared here may be radically different in your environment and with your hardware.

To run the benchmarks locally use one of the following: ```bash

Benchmark throughput

Add --features bench-md for markdown output or --features bench-plot for .svg plots

cargo bench --bench throughput

Benchmark performance of GxHash's Hasher when used in a HashSet

cargo bench --bench hashset ```

Throughput

Throughput is measured as the number of bytes hashed per second.

Some prefer talking of *latency** (time for generating a hash) or hashrate (the number of hashes generated per second) for measuring hash function performance, but those are all equivalent in the end as they all boil down to measuring the time it takes to hash some input and then apply different scalar transformation. For instance, if latency for a 4 bytes hash is 1 ms, then the throughput is 1 / 0.001 * 4 = 4000 bytes per second. Throughput allows us to conveniently compare the performance of a hash function for any input size on a single graph.*

The throughput benchmark is custom (it does not rely on criterion.rs). In an attempt of reducing biais in this microbenchmark as much as possible, it shuffles seeds, input data, and alignment. It also has the benefit of being less of a "black box" compared to criterion. There is however a criterion-based throughput benchmark named throughput_criterion if you prefer. Results vary slightly between the two benchmarks, don't hesitate to submit an issue if you suspect biais and want to suggest improvements.

Latest Benchmark Results:
aarch64 x86_64

Contributing

  • Feel free to submit PRs
  • Repository is entirely usable via cargo commands
  • Versioning is the following
    • Major for stability breaking changes (output hashes for a same input are different after changes)
    • Minor for API changes/removal
    • Patch for new APIs, bug fixes and performance improvements

Useful profiling tools

  • cargo-show-asm is an easy way to view the actual generated assembly code. You can use the hello_world example to view the isolated, unoptimized byte code for gxhash. A few useful commands:
    • Line by line generated asm: cargo asm --rust --simplify --example hello_world hello_world::gxhash
    • Generated llvm: cargo asm --llvm --example hello_world hello_world::gxhash
    • Count of assembly instructions: cargo asm --simplify --example hello_world hello_world::gxhash | grep -v '^\.' | wc -l
    • Powershell version: cargo asm --simplify --example hello_world hello_world::gxhash | where { !$_.StartsWith(".") } | measure -Line
  • AMD Prof gives some useful insights on time spent per instruction.

Publication

Author note: I'm committed to the open dissemination of scientific knowledge. In an era where access to information is more democratized than ever, I believe that science should be freely available to all both for consumption and contribution. Traditional scientific journals often involve significant financial costs, which can introduce biases and can shift the focus from purely scientific endeavors to what is currently trendy.

To counter this trend and to uphold the true spirit of research, I have chosen to share my work on "gxhash" directly on GitHub, ensuring that it's openly accessible to anyone interested. Additionally, the use of a free Zenodo DOI ensures that this research is citable and can be referenced in other works, just as traditional publications are.

I strongly believe in a world where science is not behind paywalls, and I am in for a more inclusive, unbiased, and open scientific community.

Publication:
PDF

Cite this publication / algorithm:
DOI

Owner

  • Name: Olivier Giniaux
  • Login: ogxd
  • Kind: user
  • Location: Paris
  • Company: @smartadserver

Software Engineer / Creative Coder

GitHub Events

Total
  • Issues event: 22
  • Watch event: 146
  • Delete event: 15
  • Issue comment event: 47
  • Push event: 116
  • Pull request review event: 2
  • Pull request review comment event: 2
  • Pull request event: 24
  • Fork event: 8
  • Create event: 15
Last Year
  • Issues event: 22
  • Watch event: 146
  • Delete event: 15
  • Issue comment event: 47
  • Push event: 116
  • Pull request review event: 2
  • Pull request review comment event: 2
  • Pull request event: 24
  • Fork event: 8
  • Create event: 15

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 5
  • Total pull requests: 4
  • Average time to close issues: 6 months
  • Average time to close pull requests: 4 months
  • Total issue authors: 5
  • Total pull request authors: 3
  • Average comments per issue: 1.2
  • Average comments per pull request: 0.5
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 3
  • Average time to close issues: 5 days
  • Average time to close pull requests: 4 days
  • Issue authors: 4
  • Pull request authors: 3
  • Average comments per issue: 0.75
  • Average comments per pull request: 0.33
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • vlovich (3)
  • ogxd (3)
  • imartayan (1)
  • sprappcom (1)
  • Sanmayce (1)
  • hgkamath (1)
  • daedric (1)
  • ambiso (1)
  • balupton (1)
  • i18n-now (1)
  • winstxnhdw (1)
  • pedantic79 (1)
  • yonas (1)
  • cksac (1)
  • matthieu-m (1)
Pull Request Authors
  • ogxd (25)
  • Kleinmarb (2)
  • RobertJacobsonCDC (1)
  • pacak (1)
  • xpurer (1)
  • notsatvrn (1)
  • winstxnhdw (1)
  • i18n-now (1)
  • luketpeterson (1)
  • imartayan (1)
  • mahkoh (1)
Top Labels
Issue Labels
security 🔒 (2) performance 🚀 (2) bench 📈 (2) compatibility (1) bug (1)
Pull Request Labels
documentation (1) question (1)

Packages

  • Total packages: 1
  • Total downloads:
    • cargo 277,817 total
  • Total dependent packages: 6
  • Total dependent repositories: 1
  • Total versions: 23
  • Total maintainers: 1
crates.io: gxhash

GxHash non-cryptographic algorithm

  • Versions: 23
  • Dependent Packages: 6
  • Dependent Repositories: 1
  • Downloads: 277,817 Total
Rankings
Stargazers count: 11.3%
Dependent packages count: 12.2%
Dependent repos count: 16.5%
Forks count: 16.9%
Average: 21.5%
Downloads: 50.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

Cargo.toml cargo
  • criterion 0.5.1 development
  • lazy_static 1.3 development
  • rand 0.8
ffi/Cargo.toml cargo
.github/workflows/bench.yml actions
  • actions/checkout v4 composite
  • actions/download-artifact v3 composite
  • actions/upload-artifact v3 composite
  • stefanzweifel/git-auto-commit-action v5 composite
.github/workflows/build_test.yml actions
  • actions/checkout v3 composite
.github/workflows/cross_compile.yml actions
  • actions/checkout v3 composite
.github/workflows/rust_version.yml actions
  • actions/checkout v3 composite