hdt-rs
hdt-rs: A Rust library for the Header Dictionary Triples binary RDF compression format - Published in JOSS (2023)
Science Score: 100.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 10 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
✓Committers with academic emails
1 of 6 committers (16.7%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Keywords from Contributors
Repository
Library for the Header Dictionary Triples (HDT) compression file format for RDF data.
Basic Info
- Host: GitHub
- Owner: KonradHoeffner
- License: mit
- Language: Rust
- Default Branch: main
- Homepage: https://crates.io/crates/hdt
- Size: 3.2 MB
Statistics
- Stars: 33
- Watchers: 3
- Forks: 5
- Open Issues: 13
- Releases: 21
Topics
Metadata Files
README.md
HDT
A Rust library for the Header Dictionary Triples compressed RDF format, including:
- loading the HDT default format as created by hdt-cpp
- converting N-Triples to HDT
- efficient querying by triple patterns
- serializing into other formats like RDF Turtle and N-Triples using the Sophia adapter
- running SPARQL queries (with the experimental "sparql" feature but HDT is not optimized for that)
However it cannot:
- load other HDT variants
- swap data to disk
- modify the RDF graph in memory
If you need any of the those features, consider using a SPARQL endpoint instead. For acknowledgement of all the original authors, please look at the reference implementations in C++ and Java by the https://github.com/rdfhdt organisation.
Examples
```rust,no_run use hdt::Hdt;
let file = std::fs::File::open("example.hdt").expect("error opening file");
let hdt = Hdt::read(std::io::BufReader::new(file)).expect("error loading HDT");
// query
let majors = hdt.tripleswithpattern(Some("http://dbpedia.org/resource/Leipzig"), Some("http://dbpedia.org/ontology/major"),None);
println!("{:?}", majors.collect::
You can also use the Sophia graph trait implementation to load HDT files and reduce memory consumption of an existing application based on Sophia, which is re-exported as hdt::sophia:
```rust,no_run use hdt::Hdt; use hdt::sophia::api::graph::Graph; use hdt::sophia::api::term::{IriRef, SimpleTerm, matcher::Any};
let file = std::fs::File::open("dbpedia.hdt").expect("error opening file"); let hdt = Hdt::read(std::io::BufReader::new(file)).expect("error loading HDT"); let s = SimpleTerm::Iri(IriRef::newunchecked("http://dbpedia.org/resource/Leipzig".into())); let p = SimpleTerm::Iri(IriRef::newunchecked("http://dbpedia.org/ontology/major".into())); let majors = hdt.triples_matching(Some(s),Some(p),Any); ```
If you don't want to pull in the Sophia dependency, you can exclude it:
toml
[dependencies]
hdt = { version = "...", default-features = false }
There is also a folder with runnable examples, which you can run with cargo run --example examplename (e.g. --example query).
Experimental Features
All features other than "sophia" are experimental and are neither guaranteed to work in all combinations nor adher to semver: they may change or be removed in future versions including minor or patch releases.
Cache
If the experimental cache feature is enabled, the library will speed up repeated loading of the same file by utilizing a custom cached index file if it exists or create one if it does not exist.
Theses index files are incompatible with those generated by the C++ and Java implementations.
rust
let hdt = hdt::Hdt::read_from_path(std::path::Path::new("tests/resources/snikmeta.hdt")).expect("snikmeta.hdt not found");
SPARQL
The sparql feature implements spareval .
API Documentation
See docs.rs/latest/hdt or generate for yourself with cargo doc --no-deps without disabling default features.
Performance
The performance of a query depends on the size of the graph, the type of triple pattern and the size of the result set.
When using large HDT files, make sure to enable the release profile, such as through cargo build --release, as this can be much faster than using the dev profile.
Profiling
If you want to optimize the code, you can use a profiler. The provided test data is very small in order to keep the size of the crate down; locally modifying the tests to use a large HDT file returns more meaningful results.
Example with perf and Firefox Profiler
sh
$ cargo test --release
[...]
Running unittests src/lib.rs (target/release/deps/hdt-2b2f139dafe69681)
[...]
$ perf record --call-graph=dwarf target/release/deps/hdt-2b2f139dafe69681 hdt::tests::triples
$ perf script > /tmp/test.perf
Then go to https://profiler.firefox.com/ and open /tmp/test.perf.
Criterion benchmark
sh
$ cargo bench --bench criterion
- requires persondata_en.hdt placed in
tests/resources
iai benchmark
sh
cargo bench --bench iai
- requires persondata_en_10k.hdt placed in
tests/resources - requires Valgrind to be installed
- may require a conservative target CPU like
RUSTFLAGS="-C target-cpu=x86-64" cargo bench --bench iai
Comparative benchmark suite
The separate benchmark suite compares the performance of this and some other RDF libraries.
Community Guidelines
Issues and Support
If you have a problem with the software, want to report a bug or have a feature request, please use the issue tracker. If have a different type of request, feel free to send an email to Konrad.
Citation
If you use this library in your research, please cite our paper in the Journal of Open Source Software. We also provide a CITATION.cff file.
BibTeX entry
bibtex
@article{hdtrs,
doi = {10.21105/joss.05114},
year = {2023},
publisher = {The Open Journal},
volume = {8},
number = {84},
pages = {5114},
author = {Konrad Höffner and Tim Baccaert},
title = {hdt-rs: {A} {R}ust library for the {H}eader {D}ictionary {T}riples binary {RDF} compression format},
journal = {Journal of Open Source Software}
}
Citation string
Höffner et al., (2023). hdt-rs: A Rust library for the Header Dictionary Triples binary RDF compression format. Journal of Open Source Software, 8(84), 5114, https://doi.org/10.21105/joss.05114
Contribute
We are happy to receive pull requests.
Please use cargo fmt before committing, make sure that cargo test succeeds and that the code compiles on the stable and nightly toolchain both with and without the "sophia" feature active.
cargo clippy should not report any warnings.
Owner
- Name: Konrad Höffner
- Login: KonradHoeffner
- Kind: user
- Location: Leipzig, Germany
- Company: @IMISE
- Website: http://aksw.org/KonradHoeffner
- Repositories: 66
- Profile: https://github.com/KonradHoeffner
Research Assistant @IMISE.
JOSS Publication
hdt-rs: A Rust library for the Header Dictionary Triples binary RDF compression format
Authors
Institute for Medical Informatics, Statistics, and Epidemiology, Medical Faculty, Leipzig University
Independent Researcher, Belgium
Tags
Rust HDT RDF linked data semantic webCitation (CITATION.cff)
---
cff-version: 1.2.0
title: "hdt-rs: A Rust library
for the Header Dictionary Triples binary RDF compression format"
message: If you use this software, please cite our article in the
Journal of Open Source Software.
type: software
authors:
- given-names: Konrad
family-names: Höffner
email: konrad.hoeffner@uni-leipzig.de
affiliation: >-
Institute for Medical Informatics, Statistics
and Epidemiology (IMISE), Leipzig, Germany
orcid: 'https://orcid.org/0000-0001-7358-3217'
- given-names: Baccaert
family-names: Tim
affiliation: >-
Independent Researcher
Belgium
repository-code: 'https://github.com/konradhoeffner/hdt'
url: 'https://crates.io/crates/hdt'
keywords:
- RDF
- HDT
- Rust
license: MIT
preferred-citation:
type: article
authors:
- family-names: Höffner
given-names: Konrad
orcid: "https://orcid.org/0000-0000-0000-0000"
- family-names: "Baccaert"
given-names: "Tim"
date-published: 2023-04-29
doi: 10.21105/joss.05114
issn: 2475-9066
issue: 84
journal: Journal of Open Source Software
publisher:
name: Open Journals
start: 5114
title: "hdt-rs: A Rust library
for the Header Dictionary Triples binary RDF compression format"
url: "https://joss.theoj.org/papers/10.21105/joss.05114"
volume: 8
GitHub Events
Total
- Create event: 15
- Issues event: 22
- Release event: 3
- Watch event: 11
- Delete event: 20
- Member event: 1
- Issue comment event: 144
- Push event: 80
- Pull request review event: 25
- Pull request review comment event: 22
- Pull request event: 28
- Fork event: 1
Last Year
- Create event: 15
- Issues event: 22
- Release event: 3
- Watch event: 11
- Delete event: 20
- Member event: 1
- Issue comment event: 144
- Push event: 80
- Pull request review event: 25
- Pull request review comment event: 22
- Pull request event: 28
- Fork event: 1
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Konrad Höffner | k****r@u****e | 217 |
| Pierre-Antoine Champin | p****e@w****g | 19 |
| Tim Baccaert | t****t@p****m | 15 |
| Greg Hanson | g****n@d****i | 6 |
| dependabot[bot] | 4****] | 5 |
| Remi Rampin | r****i@r****g | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 54
- Total pull requests: 44
- Average time to close issues: about 1 month
- Average time to close pull requests: 21 days
- Total issue authors: 7
- Total pull request authors: 5
- Average comments per issue: 2.13
- Average comments per pull request: 4.59
- Merged pull requests: 20
- Bot issues: 1
- Bot pull requests: 16
Past Year
- Issues: 18
- Pull requests: 33
- Average time to close issues: 13 days
- Average time to close pull requests: 14 days
- Issue authors: 3
- Pull request authors: 3
- Average comments per issue: 1.17
- Average comments per pull request: 5.67
- Merged pull requests: 11
- Bot issues: 1
- Bot pull requests: 11
Top Authors
Issue Authors
- KonradHoeffner (45)
- donpellegrino (3)
- GregHanson (2)
- dependabot[bot] (1)
- remram44 (1)
- lazear (1)
- osorensen (1)
Pull Request Authors
- dependabot[bot] (16)
- GregHanson (13)
- KonradHoeffner (11)
- pchampin (2)
- remram44 (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 3
-
Total downloads:
- cargo 37,947 total
-
Total dependent packages: 1
(may contain duplicates) -
Total dependent repositories: 1
(may contain duplicates) - Total versions: 35
- Total maintainers: 1
proxy.golang.org: github.com/KonradHoeffner/hdt
- Documentation: https://pkg.go.dev/github.com/KonradHoeffner/hdt#section-documentation
- License: mit
-
Latest release: v0.0.0
published about 3 years ago
Rankings
proxy.golang.org: github.com/konradhoeffner/hdt
- Documentation: https://pkg.go.dev/github.com/konradhoeffner/hdt#section-documentation
- License: mit
-
Latest release: v0.0.0
published about 3 years ago
Rankings
crates.io: hdt
Library for the Header Dictionary Triples (HDT) RDF compression format.
- Documentation: https://docs.rs/hdt/
- License: MIT
-
Latest release: 0.4.0
published 4 months ago
Rankings
Maintainers (1)
Dependencies
- env_logger 0.10 development
- pretty_assertions 1.3 development
- bytesize 1.1.0
- crc-any 2.3
- iref 2.2
- langtag ^0.3.2
- log 0.4
- ntriple ^0.1.1
- rsdict 0.0.6
- sophia 0.7
- sucds 0.6.0
- thiserror 1.0.37
- actions-rs/toolchain v1 composite
- actions/checkout v3 composite