https://github.com/bast/contain-r
Apptainer/Singularity container for reproducible R environments.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary
Keywords
Repository
Apptainer/Singularity container for reproducible R environments.
Basic Info
Statistics
- Stars: 3
- Watchers: 2
- Forks: 0
- Open Issues: 1
- Releases: 1
Topics
Metadata Files
README.md
contain-R
Apptainer/Singularity container for reproducible R environments.
What you need for this to work
- Apptainer or Singularity CE
install.Rorrenv.lockfile (examples below) that define the environment- An R script/project/command that you want to run in that environment
- No need to install R itself (R 4.3.0 is provided by the container)
Motivation and big picture
For reproducibility it is important to: - document dependencies - isolate dependencies from dependencies of other projects
This container: - creates a per-project renv-environment and isolates dependencies - uses pak under the hood to speed up installation - allows to configure a user- or group-wide cache which can be reused across projects - does not allow accidental "I will just quickly install it into my system and document it later" since it is a container - forces you to document your dependencies which is good for reproducibility and your future self
Dependencies are not installed into the container but only managed by the container.
Quick start on your computer
- Create a new directory.
- In the new directory create a file
install.Rwhich contains:r renv::install('ggplot2') - Download the container:
bash $ singularity pull https://github.com/bast/contain-R/releases/download/0.1.0/contain-R.sif - Run the following in your terminal (it starts installing stuff; this takes 1-2 minutes on my computer):
bash $ ./contain-R.sif R --quiet -e 'library(ggplot2)' - Run the above again (now it will only take a second).
- Run some R script which depends on that environment:
bash $ ./contain-R.sif Rscript somescript.R - Or if you want the R interactive shell:
bash $ ./contain-R.sif R
Quick start on a cluster
Same as above but instead of steps 3 and 4, use the following and adapt paths to your situation: ```bash
probably you do not want to be in your home folder to not fill your disk quota
cd /cluster/work/users/myself/experiment
download the container
$ singularity pull https://github.com/bast/contain-R/releases/download/0.1.0/contain-R.sif
you decide where these should go
export RENVCACHE=/cluster/work/users/myself/renv-cache export PAKCACHE=/cluster/work/users/myself/pak-cache
you need only one of the two
export SINGULARITYBIND="/cluster" export APPTAINERBIND="/cluster"
./contain-R.sif R --quiet -e 'library(ggplot2)' ```
install.R or renv.lock or both?
You need something to define the environment you want, either install.R or renv.lock.
An install.R file looks like this:
r
renv::install('ggplot2')
renv::install('vcfR')
renv::install('hierfstat')
renv::install('poppr')
List as many packages as you need. You can pin them to specific versions, if
needed:
r
renv::install("digest@0.6.18")
Alternatively, you can create your environment from renv.lock which looks
like this example and typically has been generated by renv:
json
{
"R": {
"Version": "3.6.1",
"Repositories": [
{
"Name": "CRAN",
"URL": "https://cloud.r-project.org"
}
]
},
"Packages": {
"markdown": {
"Package": "markdown",
"Version": "1.0",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "4584a57f565dd7987d59dda3a02cfb41"
},
"mime": {
"Package": "mime",
"Version": "0.7",
"Source": "Repository",
"Repository": "CRAN",
"Hash": "908d95ccbfd1dd274073ef07a7c93934"
}
}
}
For more information about lock files, please see
https://rstudio.github.io/renv/reference/lockfiles.html.
The container will process them in this order:
- If there is only install.R, it will use that one and create an renv environment and lock dependencies in renv.lock.
- If there is only renv.lock, it will use that one and create an renv environment.
- If install.R is more recent than renv, it will install from it (again).
- If renv.lock is more recent than renv, it will install from it (again).
In practice you will probably do either of these two:
- You arrive with install.R and it will create renv.lock and renv. You
can then take the renv.lock and use it to share an environment with your
friend. Maybe you modify install.R later and refresh renv.lock and renv.
- Or you arrive with renv.lock that you got from somebody and it will create
renv.
Generated paths
Running the container creates the following files and directories in the same
place where you run the container (but you can configure some of them if you
want them somewhere else):
- renv - holding the environment
- renv.lock - created or updated if you installed from install.R
- creates or modifies .Rprofile - renv adds the line source("renv/activate.R")
- renv-cache - renv package cache; you can change its location by defining environment variable RENV_CACHE
- pak-cache - pak package cache; you can change its location by defining environment variable RENV_CACHE
Installation takes too long?
Running a script for the first time may take time since it needs to set up the environment and download and install dependencies.
However, re-running the script will take no installation time and if dependencies are already in the cache, it will take no time either.
Pak and renv use different caches and methods
For historical reasons they are slightly different but their
developers are working on smoothing things out between the two.
You will notice the difference if you start from install.R,
and then try to restore back from the generated renv.lock: you will
notice that the two will use different methods.
Relevant GitHub issues: - https://github.com/rstudio/renv/issues/907 - https://github.com/r-lib/pak/issues/343
You have the option to turn off pak like this:
bash
export USE_PAK=false
Pros and cons of turning it off:
- Advantage: Only one cache location and everything is nicely consistent. You
could install from install.R, then remove it even and run from renv.lock
and it would be all consistent and not need to re-install anything.
- Disadvantage: First installation from install.R might take longer when
without pak.
How to configure location for package caches
You can change the location of the package caches:
bash
export RENV_CACHE=/home/user/R/renv-cache
export PAK_CACHE=/home/user/R/pak-cache
Recommendations on where to place package caches
On your own computer it will make sense to reuse the same cache(s) across all projects. This way, when installing dependencies, renv will first look whether you already have the package on your computer.
On a shared cluster it might make sense to have one common cache for your group/allocation since your research group might use similar dependencies in their work. This way you can save space and install time.
Known problems/ ideas for later
- Maybe you need a different version of R than 4.3.0. I guess we should at some point have several containers for different versions? Or you build your own from the definition file.
- It could be good to let the user configure where
renvitself should be located. Currently it is placed in the same folder where the container is run.
Resources
I have used these resources when writing/testing: - https://rstudio.github.io/renv/ - https://rstudio.github.io/renv/articles/docker.html - https://pak.r-lib.org/ - https://rstudio.github.io/packrat/ (deprecated) - https://sites.google.com/nyu.edu/nyu-hpc/hpc-systems/greene/software/r-packages-with-renv - https://raps-with-r.dev/repro_intro.html - https://www.youtube.com/watch?v=N7z1K4FhVFE (stream recording on how to use renv) - https://github.com/singularityhub/singularity-deploy
Owner
- Name: Radovan Bast
- Login: bast
- Kind: user
- Location: Tromsø, Norway
- Company: @uit-no @neicnordic
- Website: https://bast.fr
- Repositories: 181
- Profile: https://github.com/bast
Theoretical chemist turned research software engineer. Leads @coderefinery.
GitHub Events
Total
- Issues event: 3
- Watch event: 2
- Issue comment event: 2
Last Year
- Issues event: 3
- Watch event: 2
- Issue comment event: 2
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 1
- Total pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: 9 minutes
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- da5nsy (2)
- bast (1)
Pull Request Authors
- bast (2)