Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Keywords
Repository
Statistics
- Stars: 36
- Watchers: 4
- Forks: 1
- Open Issues: 2
- Releases: 0
Topics
Metadata Files
README.md
qs2
qs2: a framework for efficient serialization
qs2 is the successor to the qs package. The goal is to have reliable
and fast performance for saving and loading objects in R.
The qs2 format directly uses R serialization (via the
R_Serialize/R_Unserialize C API) while improving underlying
compression and disk IO patterns. If you are familiar with the qs
package, the benefits and usage are the same.
r
qs_save(data, "myfile.qs2")
data <- qs_read("myfile.qs2")
Use the file extension qs2 to distinguish it from the original qs
package. It is not compatible with the original qs format.
Installation
r
install.packages("qs2")
On x64 Mac or Linux, you can enable multi-threading by compiling from source. It is enabled by default on Windows.
r
remotes::install_cran("qs2", type = "source", configure.args = "--with-TBB --with-simd=AVX2")
On non-x64 systems (e.g. Mac ARM) remove the AVX2 flag.
r
remotes::install_cran("qs2", type = "source", configure.args = "--with-TBB")
Multi-threading in qs2 uses the Intel Thread Building Blocks
framework via the RcppParallel package.
Converting qs2 to RDS
Because the qs2 format directly uses R serialization, you can convert
it to RDS and vice versa.
``` r fileqs2 <- tempfile(fileext = ".qs2") filerds <- tempfile(fileext = ".RDS") x <- runif(1e6)
save x with qs_save
qssave(x, fileqs2)
convert the file to RDS
qstords(inputfile = fileqs2, outputfile = filerds)
read x back in with readRDS
xrds <- readRDS(file_rds) stopifnot(identical(x, xrds)) ```
Validating file integrity
The qs2 format saves an internal checksum. This can be used to test
for file corruption before deserialization via the validate_checksum
parameter, but has a minor performance penalty.
r
qs_save(data, "myfile.qs2")
data <- qs_read("myfile.qs2", validate_checksum = TRUE)
The qdata format
The package also introduces the qdata format which has its own
serialization layout and works with only data types (vectors, lists,
data frames, matrices).
It will replace internal types (functions, promises, external pointers,
environments, objects) with NULL. The qdata format differs from the
qs2 format in that it is NOT a general.
The eventual goal of qdata is to also have interoperability with other
languages, particularly Python.
r
qd_save(data, "myfile.qs2")
data <- qd_read("myfile.qs2")
Benchmarks
A summary across 4 datasets is presented below.
Single-threaded
| Algorithm | Compression | Save Time (s) | Read Time (s) | | --------------- | ----------- | ------------- | ------------- | | qs2 | 7.96 | 13.4 | 50.4 | | qdata | 8.45 | 10.5 | 34.8 | | base::serialize | 1.1 | 8.87 | 51.4 | | saveRDS | 8.68 | 107 | 63.7 | | fst | 2.59 | 5.09 | 46.3 | | parquet | 8.29 | 20.3 | 38.4 | | qs (legacy) | 7.97 | 9.13 | 48.1 |
Multi-threaded (8 threads)
| Algorithm | Compression | Save Time (s) | Read Time (s) | | ----------- | ----------- | ------------- | ------------- | | qs2 | 7.96 | 3.79 | 48.1 | | qdata | 8.45 | 1.98 | 33.1 | | fst | 2.59 | 5.05 | 46.6 | | parquet | 8.29 | 20.2 | 37.0 | | qs (legacy) | 7.97 | 3.21 | 52.0 |
qs2,qdataandqswithcompress_level = 3parquetvia thearrowpackage using zstdcompression_level = 3base::serializewithascii = FALSEandxdr = FALSE
Datasets used
1000 genomes non-coding VCF1000 genomes non-coding variants (2743 MB)B-cell dataB-cell mouse data, Greiff 2017 (1057 MB)IP locationIPV4 range data with location information (198 MB)Netflix movie ratingsNetflix ML prediction dataset (571 MB)
These datasets are openly licensed and represent a combination of
numeric and text data across multiple domains. See
inst/analysis/datasets.R on Github.
Usage in C/C++
Serialization functions can be accessed in compiled code. Below is an example using Rcpp.
``` cpp // [[Rcpp::depends(qs2)]]
include
include "qs2_external.h"
using namespace Rcpp;
// [[Rcpp::export]] SEXP testqsserialize(SEXP x) { sizet len = 0; unsigned char * buffer = cqsserialize(x, &len, 10, true, 4); // object, buffer length, compresslevel, shuffle, nthreads SEXP y = cqsdeserialize(buffer, len, false, 4); // buffer, buffer length, validatechecksum, nthreads cqs_free(buffer); // must manually free buffer return y; }
// [[Rcpp::export]] SEXP testqdserialize(SEXP x) { sizet len = 0; unsigned char * buffer = cqdserialize(x, &len, 10, true, 4); // object, buffer length, compresslevel, shuffle, nthreads SEXP y = cqddeserialize(buffer, len, false, false, 4); // buffer, buffer length, usealtrep, validatechecksum, nthreads cqd_free(buffer); // must manually free buffer return y; }
/*** R x <- runif(1e7) stopifnot(testqsserialize(x) == x) stopifnot(testqdserialize(x) == x) */ ```
Global Options for qs2
The following global options control the behavior of the qs2
functions. These global options can be queried or modified using qopt
function.
compress_level
The default compression level used when compressing data.
Default:3Lshuffle
A logical flag indicating whether to allow byte shuffling during compression.
Default:TRUEnthreads
The number of threads used for compression and decompression.
Default:1Lvalidate_checksum
A logical flag indicating whether to validate the stored checksum when reading data.
Default:FALSEwarn_unsupported_types
Forqd_save, a logical flag indicating whether to warn when saving an object with unsupported types.
Default:TRUEuse_alt_rep
Forqd_read, a logical flag indicating whether to use ALTREP when reading in string data.
Default:FALSE
Owner
- Name: qsbase
- Login: qsbase
- Kind: organization
- Repositories: 2
- Profile: https://github.com/qsbase
GitHub Events
Total
- Issues event: 21
- Watch event: 40
- Issue comment event: 24
- Public event: 1
- Push event: 32
- Fork event: 2
Last Year
- Issues event: 21
- Watch event: 40
- Issue comment event: 24
- Public event: 1
- Push event: 32
- Fork event: 2
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 14
- Total pull requests: 0
- Average time to close issues: 12 days
- Average time to close pull requests: N/A
- Total issue authors: 11
- Total pull request authors: 0
- Average comments per issue: 3.21
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 14
- Pull requests: 0
- Average time to close issues: 12 days
- Average time to close pull requests: N/A
- Issue authors: 11
- Pull request authors: 0
- Average comments per issue: 3.21
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- elgabbas (2)
- gevro (2)
- wlandau (1)
- dcasbioinfo (1)
- BenjaminDEMAILLE (1)
- assaron (1)
- forget999 (1)
- dominiqueemmanuel (1)
- grt9 (1)
- yvanrichard (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 2,245 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 5
- Total maintainers: 1
cran.r-project.org: qs2
Efficient Serialization of R Objects
- Homepage: https://github.com/qsbase/qs2
- Documentation: http://cran.r-project.org/web/packages/qs2/qs2.pdf
- License: GPL-3
-
Latest release: 0.1.5
published 12 months ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v3 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
- r-hub/rhub2/actions/rhub-checkout v1 composite
- r-hub/rhub2/actions/rhub-platform-info v1 composite
- r-hub/rhub2/actions/rhub-run-check v1 composite
- r-hub/rhub2/actions/rhub-setup v1 composite
- r-hub/rhub2/actions/rhub-setup-deps v1 composite
- r-hub/rhub2/actions/rhub-setup-r v1 composite
- R >= 3.5 depends
- Rcpp * imports
- stringfish >= 0.15.1 imports
- data.table * suggests
- dplyr * suggests
- knitr * suggests
- rmarkdown * suggests