Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: mdsumner
  • License: mit
  • Language: R
  • Default Branch: main
  • Size: 3.91 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

targets-uahpc

Project Status: Active – The project has reached a stable, usable state and is being actively developed. DOI

This is a minimal example of a targets workflow that can be run on the University of Arizona cluster computer. targets is an R package for workflow management that can save you time by automatically skipping code that doesn’t need to be re-run when you make changes to your data or code. It also makes parallelization relatively easy by allowing you to define each target as a separate SLURM job with the crew.cluster package.

Prerequisites:

To set-up:

To get this bare-bones pipeline running on the HPC:

  1. Click the “Use this template” button to create a repo under your own GitHub user name.
  2. Modify the HPC group name in _targets.R and in run.sh to be your PI group.
  3. SSH into the UA HPC.
  4. Clone this repo on the HPC, e.g. with git clone https://github.com/your-user-name/targets-uahpc.git.
  5. Start an interactive session on the HPC, e.g. with interactive -a <groupname> .
  6. Load R with module load R.
  7. Launch R from within the targets-uahpc/ directory with the R command
  8. The renv package should install itself. After it is done, you can install all necessary R packages by running renv::restore().

To modify the pipeline to run your code, you'll need to edit the list of targets in _targets.R as well as functions in the R/ folder. See the targets manual for more information.

Note that use of the renv package for tracking dependencies isn't strictly necessary, but it does simplify package installation on the HPC. As you add R packages dependencies, you can use targets::tar_renv() to update the _targets_packages.R file and then renv::snapshot() to add them to renv.lock. On the HPC, running renv::restore() not only installs any missing R packages, it also automatically detects system dependencies and lets you know if they aren't installed.

Running the pipeline

There are several ways you can run the pipeline that each have pros and cons:

  1. Using RStudio running on Open OnDemand
  2. From R running on the command line
  3. By submitting run.sh as a SLURM job

Open OnDemand

Log in to the Open OnDemand app dashboard. Choose an RStudio Server session and start a session specifying cores, memory per core, wall-time etc. Keep in mind, that with this method targets won't launch workers as SLURM jobs, but as separate R processes using the cores you select, so be sure to request a large enough allocation to support the workers. From RStudio use the File > Open Project... menu and navigate to the .Rproj file for this project. Then, from the console, run targets::tar_make() optionally with the as_job = TRUE argument to run it as a background process. You can occasionally check the progress of the pipeline in a variety of ways including targets::tar_visnetwork().

From R

SSH into the HPC, navigate to this project, and request an interactive session with interactive -a <groupname> -t <HH:MM:SS> where you replace the groupname with your group name, and the time stamp with how ever long you think the pipeline will take to run. Load R with module load R and launch it with R. Then you can run targets::tar_make() to kick off the pipeline and watch the progress in the R console.

With run.sh

Edit the run.sh file to update your group name and the wall-time for the main process. SSH into the HPC, navigate to this project, and run sbatch run.sh. You can watch progress by occasionally running squeue -u yourusername to see the workers launch and you can peek at the logs/ folder. You can find the most recently modified log files with something like ls -lt | head -n 5 and then you can read the logs with cat targets_main_9814039.out (or whatever file name you want to read).

Notes:

The _targets/ store can grow to be quite large depending on the size of data and the number of targets created. Your home folder on the HPC only allows 50GB of storage, so it may be wise to clone this to /groups/<groupname>/ instead, which by default has 500GB of storage. targets can optionally use cloud storage, which has the benefit of making completed targets easily accessible on a local machine. See the targets manual for more info on setting up cloud storage.

Code in _targets.R will attempt to detect if you are able to launch SLURM jobs and if not (e.g. you are not on the HPC or are using Open On Demand) it will fall back to using crew::crew_controller_local().

Owner

  • Name: Michael Sumner
  • Login: mdsumner
  • Kind: user
  • Location: Hobart, Australia
  • Company: Integrated Digital East Antarctica, Australian Antarctic Division

no names have an anonymous function

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: "targets-uahpc: A template to use the {targets} R package with UA-HPC"
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Eric R.
    family-names: Scott
    email: ericrscott@arizona.edu
    affiliation: >-
      University of Arizona, Communications & Cyber
      Technologies Data Science
    orcid: 'https://orcid.org/0000-0002-7430-7879'
repository-code: 'https://github.com/cct-datascience/targets-uahpc'
abstract: >-
  A template for using the `targets` R package for workflow
  management with the University of Arizona high performance
  computing cluster.
keywords:
  - targets
  - hpc
license: MIT
version: 0.0.1
doi: 10.5281/zenodo.10963005

GitHub Events

Total
  • Create event: 1
Last Year
  • Create event: 1

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 1
  • Total Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 1
  • Committers: 1
  • Avg Commits per committer: 1.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Michael Sumner m****r@g****m 1

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels