hdt-rs

hdt-rs: A Rust library for the Header Dictionary Triples binary RDF compression format - Published in JOSS (2023)

https://github.com/konradhoeffner/hdt

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 10 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
    1 of 6 committers (16.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

hdt rdf rust

Keywords from Contributors

mesh
Last synced: 4 months ago · JSON representation ·

Repository

Library for the Header Dictionary Triples (HDT) compression file format for RDF data.

Basic Info
Statistics
  • Stars: 33
  • Watchers: 3
  • Forks: 5
  • Open Issues: 13
  • Releases: 21
Topics
hdt rdf rust
Created over 3 years ago · Last pushed 4 months ago
Metadata Files
Readme License Citation

README.md

HDT

Latest Version Lint and Test Documentation Benchmarks HDT Rust @ LD Party Video DOI

A Rust library for the Header Dictionary Triples compressed RDF format, including:

  • loading the HDT default format as created by hdt-cpp
  • converting N-Triples to HDT
  • efficient querying by triple patterns
  • serializing into other formats like RDF Turtle and N-Triples using the Sophia adapter
  • running SPARQL queries (with the experimental "sparql" feature but HDT is not optimized for that)

However it cannot:

  • load other HDT variants
  • swap data to disk
  • modify the RDF graph in memory

If you need any of the those features, consider using a SPARQL endpoint instead. For acknowledgement of all the original authors, please look at the reference implementations in C++ and Java by the https://github.com/rdfhdt organisation.

Examples

```rust,no_run use hdt::Hdt;

let file = std::fs::File::open("example.hdt").expect("error opening file"); let hdt = Hdt::read(std::io::BufReader::new(file)).expect("error loading HDT"); // query let majors = hdt.tripleswithpattern(Some("http://dbpedia.org/resource/Leipzig"), Some("http://dbpedia.org/ontology/major"),None); println!("{:?}", majors.collect::>()); ```

You can also use the Sophia graph trait implementation to load HDT files and reduce memory consumption of an existing application based on Sophia, which is re-exported as hdt::sophia:

```rust,no_run use hdt::Hdt; use hdt::sophia::api::graph::Graph; use hdt::sophia::api::term::{IriRef, SimpleTerm, matcher::Any};

let file = std::fs::File::open("dbpedia.hdt").expect("error opening file"); let hdt = Hdt::read(std::io::BufReader::new(file)).expect("error loading HDT"); let s = SimpleTerm::Iri(IriRef::newunchecked("http://dbpedia.org/resource/Leipzig".into())); let p = SimpleTerm::Iri(IriRef::newunchecked("http://dbpedia.org/ontology/major".into())); let majors = hdt.triples_matching(Some(s),Some(p),Any); ```

If you don't want to pull in the Sophia dependency, you can exclude it:

toml [dependencies] hdt = { version = "...", default-features = false }

There is also a folder with runnable examples, which you can run with cargo run --example examplename (e.g. --example query).

Experimental Features

All features other than "sophia" are experimental and are neither guaranteed to work in all combinations nor adher to semver: they may change or be removed in future versions including minor or patch releases.

Cache

If the experimental cache feature is enabled, the library will speed up repeated loading of the same file by utilizing a custom cached index file if it exists or create one if it does not exist. Theses index files are incompatible with those generated by the C++ and Java implementations.

rust let hdt = hdt::Hdt::read_from_path(std::path::Path::new("tests/resources/snikmeta.hdt")).expect("snikmeta.hdt not found");

SPARQL

The sparql feature implements spareval .

API Documentation

See docs.rs/latest/hdt or generate for yourself with cargo doc --no-deps without disabling default features.

Performance

The performance of a query depends on the size of the graph, the type of triple pattern and the size of the result set. When using large HDT files, make sure to enable the release profile, such as through cargo build --release, as this can be much faster than using the dev profile.

Profiling

If you want to optimize the code, you can use a profiler. The provided test data is very small in order to keep the size of the crate down; locally modifying the tests to use a large HDT file returns more meaningful results.

Example with perf and Firefox Profiler

sh $ cargo test --release [...] Running unittests src/lib.rs (target/release/deps/hdt-2b2f139dafe69681) [...] $ perf record --call-graph=dwarf target/release/deps/hdt-2b2f139dafe69681 hdt::tests::triples $ perf script > /tmp/test.perf

Then go to https://profiler.firefox.com/ and open /tmp/test.perf.

Criterion benchmark

sh $ cargo bench --bench criterion

iai benchmark

sh cargo bench --bench iai

  • requires persondata_en_10k.hdt placed in tests/resources
  • requires Valgrind to be installed
  • may require a conservative target CPU like RUSTFLAGS="-C target-cpu=x86-64" cargo bench --bench iai

Comparative benchmark suite

The separate benchmark suite compares the performance of this and some other RDF libraries.

Community Guidelines

Issues and Support

If you have a problem with the software, want to report a bug or have a feature request, please use the issue tracker. If have a different type of request, feel free to send an email to Konrad.

Citation

DOI

If you use this library in your research, please cite our paper in the Journal of Open Source Software. We also provide a CITATION.cff file.

BibTeX entry

bibtex @article{hdtrs, doi = {10.21105/joss.05114}, year = {2023}, publisher = {The Open Journal}, volume = {8}, number = {84}, pages = {5114}, author = {Konrad Höffner and Tim Baccaert}, title = {hdt-rs: {A} {R}ust library for the {H}eader {D}ictionary {T}riples binary {RDF} compression format}, journal = {Journal of Open Source Software} }

Citation string

Höffner et al., (2023). hdt-rs: A Rust library for the Header Dictionary Triples binary RDF compression format. Journal of Open Source Software, 8(84), 5114, https://doi.org/10.21105/joss.05114

Contribute

We are happy to receive pull requests. Please use cargo fmt before committing, make sure that cargo test succeeds and that the code compiles on the stable and nightly toolchain both with and without the "sophia" feature active. cargo clippy should not report any warnings.

Owner

  • Name: Konrad Höffner
  • Login: KonradHoeffner
  • Kind: user
  • Location: Leipzig, Germany
  • Company: @IMISE

Research Assistant @IMISE.

JOSS Publication

hdt-rs: A Rust library for the Header Dictionary Triples binary RDF compression format
Published
April 29, 2023
Volume 8, Issue 84, Page 5114
Authors
Konrad Höffner ORCID
Institute for Medical Informatics, Statistics, and Epidemiology, Medical Faculty, Leipzig University
Tim Baccaert
Independent Researcher, Belgium
Editor
Øystein Sørensen ORCID
Tags
Rust HDT RDF linked data semantic web

Citation (CITATION.cff)

---
cff-version: 1.2.0
title: "hdt-rs: A Rust library
  for the Header Dictionary Triples binary RDF compression format"
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
type: software
authors:
  - given-names: Konrad
    family-names: Höffner
    email: konrad.hoeffner@uni-leipzig.de
    affiliation: >-
      Institute for Medical Informatics, Statistics
      and Epidemiology (IMISE), Leipzig, Germany
    orcid: 'https://orcid.org/0000-0001-7358-3217'
  - given-names: Baccaert
    family-names: Tim
    affiliation: >-
      Independent Researcher
      Belgium
repository-code: 'https://github.com/konradhoeffner/hdt'
url: 'https://crates.io/crates/hdt'
keywords:
  - RDF
  - HDT
  - Rust
license: MIT
preferred-citation:
  type: article
  authors:
    - family-names: Höffner
      given-names: Konrad
      orcid: "https://orcid.org/0000-0000-0000-0000"
    - family-names: "Baccaert"
      given-names: "Tim"
  date-published: 2023-04-29
  doi: 10.21105/joss.05114
  issn: 2475-9066
  issue: 84
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 5114
  title: "hdt-rs: A Rust library
    for the Header Dictionary Triples binary RDF compression format"
  url: "https://joss.theoj.org/papers/10.21105/joss.05114"
  volume: 8

GitHub Events

Total
  • Create event: 15
  • Issues event: 22
  • Release event: 3
  • Watch event: 11
  • Delete event: 20
  • Member event: 1
  • Issue comment event: 144
  • Push event: 80
  • Pull request review event: 25
  • Pull request review comment event: 22
  • Pull request event: 28
  • Fork event: 1
Last Year
  • Create event: 15
  • Issues event: 22
  • Release event: 3
  • Watch event: 11
  • Delete event: 20
  • Member event: 1
  • Issue comment event: 144
  • Push event: 80
  • Pull request review event: 25
  • Pull request review comment event: 22
  • Pull request event: 28
  • Fork event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 264
  • Total Committers: 6
  • Avg Commits per committer: 44.0
  • Development Distribution Score (DDS): 0.178
Past Year
  • Commits: 40
  • Committers: 3
  • Avg Commits per committer: 13.333
  • Development Distribution Score (DDS): 0.2
Top Committers
Name Email Commits
Konrad Höffner k****r@u****e 217
Pierre-Antoine Champin p****e@w****g 19
Tim Baccaert t****t@p****m 15
Greg Hanson g****n@d****i 6
dependabot[bot] 4****] 5
Remi Rampin r****i@r****g 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 54
  • Total pull requests: 44
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 21 days
  • Total issue authors: 7
  • Total pull request authors: 5
  • Average comments per issue: 2.13
  • Average comments per pull request: 4.59
  • Merged pull requests: 20
  • Bot issues: 1
  • Bot pull requests: 16
Past Year
  • Issues: 18
  • Pull requests: 33
  • Average time to close issues: 13 days
  • Average time to close pull requests: 14 days
  • Issue authors: 3
  • Pull request authors: 3
  • Average comments per issue: 1.17
  • Average comments per pull request: 5.67
  • Merged pull requests: 11
  • Bot issues: 1
  • Bot pull requests: 11
Top Authors
Issue Authors
  • KonradHoeffner (45)
  • donpellegrino (3)
  • GregHanson (2)
  • dependabot[bot] (1)
  • remram44 (1)
  • lazear (1)
  • osorensen (1)
Pull Request Authors
  • dependabot[bot] (16)
  • GregHanson (13)
  • KonradHoeffner (11)
  • pchampin (2)
  • remram44 (2)
Top Labels
Issue Labels
enhancement (13) optimize (8) bug (7) refactor (5) documentation (3) simplify (3) ram (3) question (2) test (1) good first issue (1) blocked (1) wontfix (1) docs (1) build (1) dependencies (1) rust (1)
Pull Request Labels
dependencies (16) rust (15) enhancement (5) refactor (4) bug (2)

Packages

  • Total packages: 3
  • Total downloads:
    • cargo 37,947 total
  • Total dependent packages: 1
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 35
  • Total maintainers: 1
proxy.golang.org: github.com/KonradHoeffner/hdt
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 4 months ago
proxy.golang.org: github.com/konradhoeffner/hdt
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 4 months ago
crates.io: hdt

Library for the Header Dictionary Triples (HDT) RDF compression format.

  • Versions: 33
  • Dependent Packages: 1
  • Dependent Repositories: 1
  • Downloads: 37,947 Total
Rankings
Dependent repos count: 16.5%
Dependent packages count: 18.2%
Average: 21.6%
Stargazers count: 22.9%
Forks count: 23.2%
Downloads: 27.5%
Maintainers (1)
Last synced: 4 months ago

Dependencies

Cargo.toml cargo
  • env_logger 0.10 development
  • pretty_assertions 1.3 development
  • bytesize 1.1.0
  • crc-any 2.3
  • iref 2.2
  • langtag ^0.3.2
  • log 0.4
  • ntriple ^0.1.1
  • rsdict 0.0.6
  • sophia 0.7
  • sucds 0.6.0
  • thiserror 1.0.37
.github/workflows/test.yml actions
  • actions-rs/toolchain v1 composite
  • actions/checkout v3 composite