simpleCache

simpleCache: R caching for reproducible, distributed, large-scale projects - Published in JOSS (2018)

https://github.com/databio/simplecache

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
  • Committers with academic emails
    1 of 5 committers (20.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software
Last synced: 6 months ago · JSON representation

Repository

Simplified R caching for reproducible big data projects

Basic Info
Statistics
  • Stars: 33
  • Watchers: 19
  • Forks: 6
  • Open Issues: 14
  • Releases: 7
Created over 11 years ago · Last pushed almost 5 years ago
Metadata Files
Readme Changelog Contributing License

README.md

simpleCache: R caching for restartable analysis

Travis CI status

simpleCache is an R package providing functions for caching R objects. Its purpose is to encourage writing reusable, restartable, and reproducible analysis pipelines for projects with massive data and computational requirements.

Like its name indicates, simpleCache is intended to be simple. You choose a location to store your caches, and then provide the function with nothing more than a cache name and instructions (R code) for how to produce the R object. While simple, simpleCache also provides some advanced options like environment assignments, recreating caches, reloading caches, and even cluster compute bindings (using the batchtools package) making it flexible enough for use in large-scale data analysis projects.


Installing simpleCache

simpleCache is on CRAN and can be installed as usual:

install.packages("simpleCache")


Running simpleCache

simpleCache comes with a single primary function (simpleCache()) that will do almost everything you need. In short, you run it with a few lines like this:

library(simpleCache) setCacheDir(tempdir()) simpleCache("normSample", { rnorm(1e7, 0,1) }, recreate=TRUE) simpleCache("normSample", { rnorm(1e7, 0,1) })

simpleCache also interfaces with the batchtools package to let you build caches on any cluster resource manager.


Highlights of exported functions

  • simpleCache(): Creates and caches or reloads cached results of provided R instruction code
  • listCaches(): Lists all of the caches available in the cacheDir
  • deleteCaches(): Deletes cache(s) from the cacheDir
  • setCacheDir(): Sets a global option for a cache directory so you don't have to specify one in each simpleCache call
  • simpleCacheOptions(): Views all of the simpleCache global options that have been set

simpleCache Philosophy

The use case I had in mind for simpleCache is that you find yourself constantly recalculating the same R object in several different scripts, or repeatedly in the same script, every time you open it and want to continue that project. SimpleCache is well-suited for interactive analysis, allowing you to pick up right where you left off in a new R session, without having to recalculate everything. It is equally useful in automatic pipelines, where separate scripts may benefit from loading, instead of recalculating, the same R objects produced by other scripts.

R provides some base functions (save, serialize, and load) to let you save and reload such objects, but these low-level functions are a bit cumbersome. simpleCache simply provides a convenient, user-friendly interface to these functions, streamlining the process. For example, a single simpleCache call will check for a cache and load it if it exists, or create it if it does not. With the base R save and load functions, you can't just write a single function call and then run the same thing every time you start the script -- even this simple use case requires additional logic to check for an existing cache. simpleCache just does all this for you.

The thing to keep in mind with simpleCache is that the cache name is paramount. simpleCache assumes that your name for an object is a perfect identifier for that object; in other words, don't cache things that you plan to change.

Contributing

simpleCache is licensed under the 2-Clause BSD License. Questions, feature requests and bug reports are welcome via the issue queue. The maintainer will review pull requests and incorporate contributions at his discretion.

For more information refer to the contributing document and pull request / issue templates in the .github folder of this repository.

Owner

  • Name: Databio
  • Login: databio
  • Kind: organization
  • Location: University of Virginia

Solving problems in computational biology

JOSS Publication

simpleCache: R caching for reproducible, distributed, large-scale projects
Published
January 10, 2018
Volume 3, Issue 21, Page 463
Authors
Nathan Sheffield ORCID
University of Virginia
Vp Nagraj ORCID
University of Virginia
Vince Reuter ORCID
University of Virginia
Editor
Thomas J. Leeper ORCID

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 200
  • Total Committers: 5
  • Avg Commits per committer: 40.0
  • Development Distribution Score (DDS): 0.615
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
nsheff n****f 77
VP Nagraj v****j@v****u 49
Vince Reuter v****r@g****m 44
sheffien s****n 29
Michal Stolarczyk s****3@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 25
  • Total pull requests: 23
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 19 days
  • Total issue authors: 8
  • Total pull request authors: 3
  • Average comments per issue: 2.12
  • Average comments per pull request: 1.83
  • Merged pull requests: 19
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • vreuter (13)
  • sckott (4)
  • nsheff (3)
  • ghost (1)
  • koheiw (1)
  • carlsonp (1)
  • j-lawson (1)
Pull Request Authors
  • vreuter (10)
  • nsheff (7)
  • vpnagraj (6)
Top Labels
Issue Labels
enhancement (5) question (4) brainstorming (2)
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • cran 312 last-month
  • Total docker downloads: 48
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 6
    (may contain duplicates)
  • Total versions: 6
  • Total maintainers: 1
cran.r-project.org: simpleCache

Simply Caching R Objects

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 6
  • Downloads: 312 Last month
  • Docker Downloads: 48
Rankings
Stargazers count: 9.2%
Forks count: 9.6%
Dependent repos count: 12.0%
Average: 22.0%
Dependent packages count: 28.8%
Downloads: 50.2%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: r-simplecache
  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Stargazers count: 41.3%
Average: 44.4%
Forks count: 47.4%
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • batchtools * enhances
  • knitr * suggests
  • rmarkdown * suggests
  • testthat * suggests