ostrich

🐦 Versioned RDF triple store (OffSet-enabled TRIple store for CHangesets)

https://github.com/rdfostrich/ostrich

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.0%) to scientific vocabulary

Keywords

rdf semantic-web triplestore versioning
Last synced: 6 months ago · JSON representation ·

Repository

🐦 Versioned RDF triple store (OffSet-enabled TRIple store for CHangesets)

Basic Info
  • Host: GitHub
  • Owner: rdfostrich
  • License: other
  • Language: C++
  • Default Branch: master
  • Homepage:
  • Size: 810 KB
Statistics
  • Stars: 43
  • Watchers: 7
  • Forks: 2
  • Open Issues: 1
  • Releases: 1
Topics
rdf semantic-web triplestore versioning
Created over 8 years ago · Last pushed over 2 years ago
Metadata Files
Readme License Citation

README.md

OSTRICH

Offset-enabled TRIple store for CHangesets

test-on-commit Docker Automated Build DOI

OSTRICH is an RDF triple store that allows multiple versions of a dataset to be stored and queried at the same time.

The store is a hybrid between snapshot, delta and timestamp-based storage, which provides a good trade-off between storage size and query time. It provides several built-in algorithms to enable efficient iterator-based queries at a certain version, between any two versions, and for versions. These queries support limits and offsets for any triple pattern.

Insertion is done by first inserting a dataset snapshot, which is encoded in HDT. After that, deltas can be inserted, which contain additions and deletions based on the last delta or snapshot.

Learn more about the internals of OSTRICH in the following articles:

Building

OSTRICH requires ZLib, Kyoto Cabinet, Boost, Serd, Raptor2 and CMake (compilation only) to be installed. Inspect our CI workflow file to see how dependencies are installed on Ubuntu.

Compile: bash $ mkdir build $ cd build $ cmake .. $ make

Running

The OSTRICH dataset will always be loaded from the current directory.

Tests

bash build/ostrich_test

Query

bash build/ostrich-query-version-materialized patch_id s p o build/ostrich-query-delta-materialized patch_id patch_id_end s p o build/ostrich-query-version patch_id s p o

Insert

bash build/ostrich-insert [-v] patch_id [+|- file_1.nt [file_2.nt [...]]]*

Input deltas must be sorted in SPO-order.

Evaluate

Only load changesets from a path structured as path_to_patch_directory/patch_id/main.nt.additions.txt and path_to_patch_directory/patch_id/main.nt.deletions.txt. bash build/ostrich-evaluate path_to_patch_directory patch_id patch_id_end CSV-formatted insert data will be emitted: version,added,durationms,rate,accsize.

Load changesets AND query with triple patterns from the given file on separate lines, with the given number of replications. bash build/ostrich-evaluate path_to_patch_directory patch_id patch_id_end patch_to_queries/queries.txt s|p|o nr_replications CSV-formatted query data will be emitted (time in microseconds) for all versions for the three query types: patch,offset,limit,count-ms,lookup-mus,results.

Docker

Alternatively, OSTRICH can be built and run using Docker.

Build

bash docker build -t ostrich .

Instead of building the container yourself, you can use the pre-built image from DockerHub. bash docker pull rdfostrich/ostrich

Test

bash docker run --rm -it --entrypoint /opt/patchstore/build/ostrich_test ostrich

Query

bash docker run --rm -it --entrypoint /opt/ostrich/build/ostrich-query-version-materialized ostrich patch_id s p o docker run --rm -it --entrypoint /opt/ostrich/build/ostrich-query-delta-materialized ostrich patch_id patch_id_end s p o docker run --rm -it --entrypoint /opt/ostrich/build/ostrich-query-version ostrich s p o

Insert

bash docker run --rm -it --entrypoint /opt/ostrich/build/ostrich-insert ostrich [-v] patch_id [+|- file_1.nt [file_2.nt [...]]]*

Evaluate

Only load changesets from a path structured as path_to_patch_directory/patch_id/main.nt.additions.txt and path_to_patch_directory/patch_id/main.nt.deletions.txt. bash docker run --rm -it -v path_to_patch_directory:/var/patches ostrich /var/patches patch_id patch_id_end

Load changesets AND query with triple patterns from the given file on separate lines, with the given number of replications. bash docker run --rm -it -v path_to_patch_directory:/var/patches -v patch_to_queries:/var/queries ostrich /var/patches patch_id patch_id_end /var/queries/queries.txt s|p|o nr_replications

Enable debug mode: bash docker run --rm -it -v path_to_patch_directory:/var/patches -v patch_to_queries:/var/queries -v path_to_crash_dir:/crash --privileged=true ostrich --debug /var/patches patch_id patch_id_end /var/queries/queries.txt s|p|o nr_replications

Compiler variables

PATCH_INSERT_BUFFER_SIZE: The size of the triple parser buffer during patch insertion. (default 100)

FLUSH_POSITIONS_COUNT: The amount of triples after which the patch positions should be flushed to disk, to avoid memory issues. (default 500000)

FLUSH_TRIPLES_COUNT: The amount of triples after which the store should be flushed to disk, to avoid memory issues. (default 500000)

KC_MEMORY_MAP_SIZE: The KC memory map size per tree. (default 1LL << 27 = 128MB)

KC_PAGE_CACHE_SIZE: The KC page cache size per tree. (default 1LL << 25 = 32MB)

MIN_ADDITION_COUNT: The minimum addition triple count so that it will be stored in the db. Changing this value only has effect during insertion time. Lookups are compatible with any value. (default 200)

Cite

If you are using or extending OSTRICH as part of a scientific publication, we would appreciate a citation of our article.

bibtex @article{taelman_jws_ostrich_2018, author = {Taelman, Ruben and Vander Sande, Miel and Van Herwegen, Joachim and Mannens, Erik and Verborgh, Ruben}, title = {Triple Storage for Random-Access Versioned Querying of RDF Archives}, journal = {Journal of Web Semantics}, year = {2018}, month = aug, url = {https://rdfostrich.github.io/article-jws2018-ostrich/} }

License

This software is written by Ruben Taelman, Olivier Pelgrin, and colleagues.

This code is copyrighted by Ghent University – imec and Aalborg University, and is released under the MIT license.

Owner

  • Name: OSTRICH
  • Login: rdfostrich
  • Kind: organization

Projects related to OSTRICH

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite the article."
authors:
- family-names: "Taelman"
  given-names: "Ruben"
- family-names: "Vander Sande"
  given-names: "Miel"
title: "OSTRICH"
version: 1.0.0
doi: 10.5281/zenodo.883008
date-released: 2017-10-01
url: "https://github.com/rdfostrich/ostrich"
preferred-citation:
  type: article
  authors:
  - family-names: "Taelman"
    given-names: "Ruben"
  - family-names: "Vander Sande"
    given-names: "Miel"
  - family-names: "Van Herwegen"
    given-names: "Joachim"
  - family-names: "Verborgh"
    given-names: "Ruben"
  doi: "10.2139/ssrn.3248501"
  journal: "Journal of Web Semantics"
  month: 9
  title: "Triple Storage for Random-Access Versioned Querying of RDF Archives"
  year: 2018

GitHub Events

Total
  • Issues event: 1
  • Watch event: 2
  • Pull request event: 2
Last Year
  • Issues event: 1
  • Watch event: 2
  • Pull request event: 2

Dependencies

Dockerfile docker
  • buildpack-deps jessie build
.github/workflows/ostrich_test.yml actions
  • actions/checkout v3 composite