ostrich
🐦 Versioned RDF triple store (OffSet-enabled TRIple store for CHangesets)
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary
Keywords
Repository
🐦 Versioned RDF triple store (OffSet-enabled TRIple store for CHangesets)
Basic Info
Statistics
- Stars: 43
- Watchers: 7
- Forks: 2
- Open Issues: 1
- Releases: 1
Topics
Metadata Files
README.md
OSTRICH
Offset-enabled TRIple store for CHangesets
OSTRICH is an RDF triple store that allows multiple versions of a dataset to be stored and queried at the same time.
The store is a hybrid between snapshot, delta and timestamp-based storage, which provides a good trade-off between storage size and query time. It provides several built-in algorithms to enable efficient iterator-based queries at a certain version, between any two versions, and for versions. These queries support limits and offsets for any triple pattern.
Insertion is done by first inserting a dataset snapshot, which is encoded in HDT. After that, deltas can be inserted, which contain additions and deletions based on the last delta or snapshot.
Learn more about the internals of OSTRICH in the following articles:
- Triple Storage for Random-Access Versioned Querying of RDF Archives
- Scaling Large RDF Archives To Very Long Histories
- OSTRICH: Versioned Random-Access Triple Store
- GLENDA: Querying RDF Archives with full SPARQL
Building
OSTRICH requires ZLib, Kyoto Cabinet, Boost, Serd, Raptor2 and CMake (compilation only) to be installed. Inspect our CI workflow file to see how dependencies are installed on Ubuntu.
Compile:
bash
$ mkdir build
$ cd build
$ cmake ..
$ make
Running
The OSTRICH dataset will always be loaded from the current directory.
Tests
bash
build/ostrich_test
Query
bash
build/ostrich-query-version-materialized patch_id s p o
build/ostrich-query-delta-materialized patch_id patch_id_end s p o
build/ostrich-query-version patch_id s p o
Insert
bash
build/ostrich-insert [-v] patch_id [+|- file_1.nt [file_2.nt [...]]]*
Input deltas must be sorted in SPO-order.
Evaluate
Only load changesets from a path structured as path_to_patch_directory/patch_id/main.nt.additions.txt and path_to_patch_directory/patch_id/main.nt.deletions.txt.
bash
build/ostrich-evaluate path_to_patch_directory patch_id patch_id_end
CSV-formatted insert data will be emitted: version,added,durationms,rate,accsize.
Load changesets AND query with triple patterns from the given file on separate lines, with the given number of replications.
bash
build/ostrich-evaluate path_to_patch_directory patch_id patch_id_end patch_to_queries/queries.txt s|p|o nr_replications
CSV-formatted query data will be emitted (time in microseconds) for all versions for the three query types: patch,offset,limit,count-ms,lookup-mus,results.
Docker
Alternatively, OSTRICH can be built and run using Docker.
Build
bash
docker build -t ostrich .
Instead of building the container yourself, you can use the pre-built image from DockerHub.
bash
docker pull rdfostrich/ostrich
Test
bash
docker run --rm -it --entrypoint /opt/patchstore/build/ostrich_test ostrich
Query
bash
docker run --rm -it --entrypoint /opt/ostrich/build/ostrich-query-version-materialized ostrich patch_id s p o
docker run --rm -it --entrypoint /opt/ostrich/build/ostrich-query-delta-materialized ostrich patch_id patch_id_end s p o
docker run --rm -it --entrypoint /opt/ostrich/build/ostrich-query-version ostrich s p o
Insert
bash
docker run --rm -it --entrypoint /opt/ostrich/build/ostrich-insert ostrich [-v] patch_id [+|- file_1.nt [file_2.nt [...]]]*
Evaluate
Only load changesets from a path structured as path_to_patch_directory/patch_id/main.nt.additions.txt and path_to_patch_directory/patch_id/main.nt.deletions.txt.
bash
docker run --rm -it -v path_to_patch_directory:/var/patches ostrich /var/patches patch_id patch_id_end
Load changesets AND query with triple patterns from the given file on separate lines, with the given number of replications.
bash
docker run --rm -it -v path_to_patch_directory:/var/patches -v patch_to_queries:/var/queries ostrich /var/patches patch_id patch_id_end /var/queries/queries.txt s|p|o nr_replications
Enable debug mode:
bash
docker run --rm -it -v path_to_patch_directory:/var/patches -v patch_to_queries:/var/queries -v path_to_crash_dir:/crash --privileged=true ostrich --debug /var/patches patch_id patch_id_end /var/queries/queries.txt s|p|o nr_replications
Compiler variables
PATCH_INSERT_BUFFER_SIZE: The size of the triple parser buffer during patch insertion. (default 100)
FLUSH_POSITIONS_COUNT: The amount of triples after which the patch positions should be flushed to disk, to avoid memory issues. (default 500000)
FLUSH_TRIPLES_COUNT: The amount of triples after which the store should be flushed to disk, to avoid memory issues. (default 500000)
KC_MEMORY_MAP_SIZE: The KC memory map size per tree. (default 1LL << 27 = 128MB)
KC_PAGE_CACHE_SIZE: The KC page cache size per tree. (default 1LL << 25 = 32MB)
MIN_ADDITION_COUNT: The minimum addition triple count so that it will be stored in the db. Changing this value only has effect during insertion time. Lookups are compatible with any value. (default 200)
Cite
If you are using or extending OSTRICH as part of a scientific publication, we would appreciate a citation of our article.
bibtex
@article{taelman_jws_ostrich_2018,
author = {Taelman, Ruben and Vander Sande, Miel and Van Herwegen, Joachim and Mannens, Erik and Verborgh, Ruben},
title = {Triple Storage for Random-Access Versioned Querying of RDF Archives},
journal = {Journal of Web Semantics},
year = {2018},
month = aug,
url = {https://rdfostrich.github.io/article-jws2018-ostrich/}
}
License
This software is written by Ruben Taelman, Olivier Pelgrin, and colleagues.
This code is copyrighted by Ghent University – imec and Aalborg University, and is released under the MIT license.
Owner
- Name: OSTRICH
- Login: rdfostrich
- Kind: organization
- Repositories: 20
- Profile: https://github.com/rdfostrich
Projects related to OSTRICH
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite the article."
authors:
- family-names: "Taelman"
given-names: "Ruben"
- family-names: "Vander Sande"
given-names: "Miel"
title: "OSTRICH"
version: 1.0.0
doi: 10.5281/zenodo.883008
date-released: 2017-10-01
url: "https://github.com/rdfostrich/ostrich"
preferred-citation:
type: article
authors:
- family-names: "Taelman"
given-names: "Ruben"
- family-names: "Vander Sande"
given-names: "Miel"
- family-names: "Van Herwegen"
given-names: "Joachim"
- family-names: "Verborgh"
given-names: "Ruben"
doi: "10.2139/ssrn.3248501"
journal: "Journal of Web Semantics"
month: 9
title: "Triple Storage for Random-Access Versioned Querying of RDF Archives"
year: 2018
GitHub Events
Total
- Issues event: 1
- Watch event: 2
- Pull request event: 2
Last Year
- Issues event: 1
- Watch event: 2
- Pull request event: 2
Dependencies
- buildpack-deps jessie build
- actions/checkout v3 composite