Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 8 committers (12.5%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary
Keywords
Repository
Quick serialization of R objects
Statistics
- Stars: 428
- Watchers: 13
- Forks: 19
- Open Issues: 2
- Releases: 0
Topics
Metadata Files
README.md
Using qs
Quick serialization of R objects
qs provides an interface for quickly saving and reading objects to and
from disk. The goal of this package is to provide a lightning-fast and
complete replacement for the saveRDS and readRDS functions in R.
Inspired by the fst package, qs uses a similar block-compression
design using either the lz4 or zstd compression libraries. It
differs in that it applies a more general approach for attributes and
object references.
saveRDS and readRDS are the standard for serialization of R data,
but these functions are not optimized for speed. On the other hand,
fst is extremely fast, but only works on data.frame’s and certain
column types.
qs is both extremely fast and general: it can serialize any R object
like saveRDS and is just as fast and sometimes faster than fst.
Usage
r
library(qs)
df1 <- data.frame(x = rnorm(5e6), y = sample(5e6), z=sample(letters, 5e6, replace = T))
qsave(df1, "myfile.qs")
df2 <- qread("myfile.qs")
Installation
``` r
CRAN version
install.packages("qs")
CRAN version compile from source (recommended)
remotes::install_cran("qs", type = "source", configure.args = "--with-simd=AVX2") ```
Features
The table below compares the features of different serialization approaches in R.
| | qs | fst | saveRDS | | -------------------- | :-: | :----------------: | :-----: | | Not Slow | ✔ | ✔ | ❌ | | Numeric Vectors | ✔ | ✔ | ✔ | | Integer Vectors | ✔ | ✔ | ✔ | | Logical Vectors | ✔ | ✔ | ✔ | | Character Vectors | ✔ | ✔ | ✔ | | Character Encoding | ✔ | (vector-wide only) | ✔ | | Complex Vectors | ✔ | ❌ | ✔ | | Data.Frames | ✔ | ✔ | ✔ | | On disk row access | ❌ | ✔ | ❌ | | Random column access | ❌ | ✔ | ❌ | | Attributes | ✔ | Some | ✔ | | Lists / Nested Lists | ✔ | ❌ | ✔ | | Multi-threaded | ✔ | ✔ | ❌ |
qs also includes a number of advanced features:
- For character vectors, qs also has the option of using the new ALTREP system (R version 3.5+) to quickly read in string data.
- For numerical data (numeric, integer, logical and complex vectors)
qsimplements byte shuffling filters (adopted from the Blosc meta-compression library). These filters utilize extended CPU instruction sets (either SSE2 or AVX2). qsalso efficiently serializes S4 objects, environments, and other complex objects.
These features have the possibility of additionally increasing performance by orders of magnitude, for certain types of data. See sections below for more details.
Summary Benchmarks
The following benchmarks were performed comparing qs, fst and
saveRDS/readRDS in base R for serializing and de-serializing a
medium sized data.frame with 5 million rows (approximately 115 Mb in
memory):
r
data.frame(a = rnorm(5e6),
b = rpois(5e6, 100),
c = sample(starnames$IAU, 5e6, T),
d = sample(state.name, 5e6, T),
stringsAsFactors = F)
qs is highly parameterized and can be tuned by the user to extract as
much speed and compression as possible, if desired. For simplicity, qs
comes with 4 presets, which trades speed and compression ratio: “fast”,
“balanced”, “high” and “archive”.
The plots below summarize the performance of saveRDS, qs and fst
with various parameters:
Serializing

De-serializing

(Benchmarks are based on qs ver. 0.21.2, fst ver. 0.9.0 and R
3.6.1.)
Benchmarking write and read speed is a bit tricky and depends highly on a number of factors, such as operating system, the hardware being run on, the distribution of the data, or even the state of the R instance. Reading data is also further subjected to various hardware and software memory caches.
Generally speaking, qs and fst are considerably faster than
saveRDS regardless of using single threaded or multi-threaded
compression. qs also manages to achieve superior compression ratio
through various optimizations (e.g. see “Byte Shuffle” section below).
ALTREP character vectors
The ALTREP system (new as of R 3.5.0) allows package developers to represent R objects using their own custom memory layout. This allows a potentially large speedup in processing certain types of data.
In qs, ALTREP character vectors are implemented via the
stringfish package and can
be used by setting use_alt_rep=TRUE in the qread function. The
benchmark below shows the time it takes to qread several million
random strings (nchar = 80) with and without ALTREP.

The large speedup demonstrates why one would want to consider the
system, but there are caveats. Downstream processing functions must be
ALTREP-aware. See the
stringfish package for more
details.
Byte shuffle
Byte shuffling (adopted from the Blosc meta-compression library) is a way of re-organizing data to be more amenable to compression. An integer contains four bytes and the limits of an integer in R are +/- 2^31-1. However, most real data doesn’t use anywhere near the range of possible integer values. For example, if the data were representing percentages, 0% to 100%, the first three bytes would be unused and zero.
Byte shuffling rearranges the data such that all of the first bytes are
blocked together, all of the second bytes are blocked together, and so
on. This procedure often makes it very easy for compression algorithms
to find repeated patterns and can often improve compression ratio by
orders of magnitude. In the example below, shuffle compression achieves
a compression ratio of over 1:1000. See ?qsave for more details.
``` r
With byte shuffling
x <- 1:1e8 qsave(x, "mydat.qs", preset = "custom", shuffle_control = 15, algorithm = "zstd") cat( "Compression Ratio: ", as.numeric(object.size(x)) / file.info("mydat.qs")$size, "\n" )
Compression Ratio: 1389.164
Without byte shuffling
x <- 1:1e8 qsave(x, "mydat.qs", preset = "custom", shuffle_control = 0, algorithm = "zstd") cat( "Compression Ratio: ", as.numeric(object.size(x)) / file.info("mydat.qs")$size, "\n" )
Compression Ratio: 1.479294
```
Serializing to memory
You can use qs to directly serialize objects to memory.
Example:
r
library(qs)
x <- qserialize(c(1, 2, 3))
qdeserialize(x)
[1] 1 2 3
Serializing objects to ASCII
The qs package includes two sets of utility functions for converting
binary data to ASCII:
base85_encodeandbase85_decodebase91_encodeandbase91_decode
These functions are similar to base64 encoding functions found in various packages, but offer greater efficiency.
Example:
r
enc <- base91_encode(qserialize(datasets::mtcars, preset = "custom", compress_level = 22))
dec <- qdeserialize(base91_decode(enc))
(Note: base91 strings contain double quote characters (") and need to
be single quoted if stored as a string.)
See the help files for additional details and history behind these algorithms.
Using qs within Rcpp
qs functions can be called directly within C++ code via Rcpp.
Example C++ script:
// [[Rcpp::depends(qs)]]
#include <Rcpp.h>
#include <qs.h>
using namespace Rcpp;
// [[Rcpp::export]]
void test() {
qs::qsave(IntegerVector::create(1,2,3), "/tmp/myfile.qs", "high", "zstd", 1, 15, true, 1);
}
R side:
``` r library(qs) library(Rcpp) sourceCpp("test.cpp")
save file using Rcpp interface
test()
read in file created through Rcpp interface
qread("/tmp/myfile.qs") [1] 1 2 3 ```
The C++ functions do not have default parameters; all parameters must be specified.
Future developments
- Additional compression algorithms
- Improved ALTREP serialization
- Re-write of multithreading code
- Mac M1 optimizations (NEON) and checking
Future versions will be backwards compatible with the current version.
Owner
- Name: qsbase
- Login: qsbase
- Kind: organization
- Repositories: 2
- Profile: https://github.com/qsbase
GitHub Events
Total
- Issues event: 1
- Watch event: 29
- Issue comment event: 9
- Push event: 2
- Pull request event: 2
- Fork event: 1
Last Year
- Issues event: 1
- Watch event: 29
- Issue comment event: 9
- Push event: 2
- Pull request event: 2
- Fork event: 1
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| tching | t****c@g****m | 228 |
| Salim B | g****t@s****e | 9 |
| mariusbarth | m****h@u****e | 3 |
| Xianying Tan | s****n@1****m | 2 |
| OWG\bryce.chamberlain | b****n@o****m | 2 |
| spaette | s****e@g****m | 1 |
| Romain François | r****n@p****t | 1 |
| Kun Ren | m****l@r****e | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 87
- Total pull requests: 15
- Average time to close issues: 4 months
- Average time to close pull requests: about 12 hours
- Total issue authors: 66
- Total pull request authors: 7
- Average comments per issue: 4.17
- Average comments per pull request: 1.33
- Merged pull requests: 11
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 1
- Average time to close issues: 2 days
- Average time to close pull requests: about 7 hours
- Issue authors: 3
- Pull request authors: 1
- Average comments per issue: 3.33
- Average comments per pull request: 2.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- traversc (2)
- thomascwells (1)
- NadineBestard (1)
Pull Request Authors
- mariusbarth (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 16,820 last-month
- Total docker downloads: 46,660
- Total dependent packages: 49
- Total dependent repositories: 117
- Total versions: 31
- Total maintainers: 1
cran.r-project.org: qs
Quick Serialization of R Objects
- Homepage: https://github.com/qsbase/qs
- Documentation: http://cran.r-project.org/web/packages/qs/qs.pdf
- License: GPL-3
-
Latest release: 0.27.3
published 12 months ago
Rankings
Maintainers (1)
Dependencies
- R >= 3.0.2 depends
- RApiSerialize >= 0.1.1 imports
- Rcpp * imports
- stringfish >= 0.15.1 imports
- data.table * suggests
- dplyr * suggests
- knitr * suggests
- rmarkdown * suggests
- testthat * suggests
- actions/checkout v3 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-pandoc v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite