hpc-workflows
HPC Workflow Management with Snakemake
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
2 of 7 committers (28.6%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.4%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
HPC Workflow Management with Snakemake
Basic Info
- Host: GitHub
- Owner: carpentries-incubator
- License: other
- Language: Python
- Default Branch: main
- Homepage: https://carpentries-incubator.github.io/hpc-workflows/
- Size: 12 MB
Statistics
- Stars: 3
- Watchers: 10
- Forks: 4
- Open Issues: 9
- Releases: 0
Topics
Metadata Files
README.md
Tame Your Workflow with Snakemake
In HPC Intro, learners explored the scheduler on their cluster by
launching a program called amdahl. The objective of this lesson is
to adapt the manual job submission process into a repeatable, reusable workflow
with minimal human intervention. This is accomplished using
Snakemake, a modern workflow engine.
If you are interested in learning more about workflow tools, please visit The Workflows Community.
Snakemake is best for single-node jobs
NERSC's Snakemake docs lists Snakemake's "cluster mode" as a disadvantage, since it submits each "rule" as a separate job, thereby spamming the scheduler with dependent tasks. The main Snakemake process also resides on the login node until all jobs have finished, occupying some resources.
If you wish to adapt your Python-based program for multi-node cluster execution, consider applying the workflow principles learned from this lesson to the Parsl framework. Again, NERSC's Parsl docs provide helpful tips.
Contributing
This is a translation of the old HPC Workflows lesson using The Carpentries Workbench and R Markdown (Rmd). You are cordially invited to contribute! Please check the list of issues if you're unsure where to start.
Building Locally
If you edit the lesson, it is important to verify that the changes are rendered
properly in the online version. The best way to do this is to build the lesson
locally. You will need an R environment to do this: as described in the
{sandpaper} docs, the environment can be either your terminal or
RStudio.
Setup
The environment.yml file describes a Conda virtual environment that
includes R, Snakemake, amdahl,
pandoc, and termplotlib: the tools you'll need to
develop and run this lesson, as well as some depencencies. To prepare the
environment, install Miniconda following the official
instructions. Then open a shell application and create a new environment:
shell
you@yours:~$ cd path/to/local/hpc-workflows
you@yours:hpc-workflows$ conda env create -f environment.yaml
N.B.: the environment will be named "workflows" by default. If you prefer another name, add
-n «alternate_name»to the command.
{sandpaper}
{sandpaper} is the engine behind The Carpentries Workbench lesson layout and static website generator. It is an R package, and has not yet been installed. Paraphrasing the installation instructions, start R or radian, then install:
shell
you@yours:hpc-workflows$ R --no-restore --no-save
R
install.packages(c("sandpaper", "varnish", "pegboard", "tinkr"),
repos = c("https://carpentries.r-universe.dev/", getOption("repos")))
Now you can render the site! From your R session,
R
library("sandpaper")
sandpaper::serve()
This should output something like the following:
plain
Output created: hpc-workflows/site/docs/index.html
To stop the server, run servr::daemon_stop(1) or restart your R session
Serving the directory hpc-workflows/site/docs at http://127.0.0.1:4321
Click on the link to http://127.0.0.1:4321 or copy and paste it in your browser. You should see any changes you've made to the lesson on the corresponding page(s). If it looks right, you're set to proceed!
Owner
- Name: carpentries-incubator
- Login: carpentries-incubator
- Kind: organization
- Repositories: 107
- Profile: https://github.com/carpentries-incubator
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
---
cff-version: 1.2.0
title: "HPC Workflow Management with Snakemake"
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Alan
family-names: O'Cais
email: "alan.ocais@cecam.org"
affiliation: "University of Barcelona"
orcid: "https://orcid.org/0000-0002-8254-8752"
alias: ocaisa
- given-names: Andrew
family-names: Reid
email: "andrew.reid@nist.gov"
affiliation: "National Institute of Standards and Technology"
orcid: "https://orcid.org/0000-0002-1564-5640"
alias: reid-a
- given-names: Annajiat
family-names: Alim Rasel
email: "annajiat@bracu.ac.bd"
affiliation: "Brac University"
orcid: "https://orcid.org/0000-0003-0198-3734"
alias: annajiat
- given-names: Benson
family-names: Muite
email: "benson_muite@emailplus.org"
affiliation: "Kichakato Kizito"
alias: bkmgit
- given-names: Trevor
family-names: Keller
email: "trevor.keller@nist.gov"
affiliation: "National Institute of Standards and Technology"
orcid: "https://orcid.org/0000-0002-2920-8302"
alias: tkphd
- given-names: Wirawan
family-names: Purwanto
email: "wpurwant@odu.edu"
affiliation: "Old Dominion University"
orcid: "https://orcid.org/0000-0002-2124-4552"
alias: wirawan0
repository-code: "https://github.com/carpentries-incubator/hpc-workflows"
url: "https://carpentries-incubator.github.io/hpc-workflows/"
abstract: >-
When using HPC resources, it's very common to need to
carry out the same set of tasks over a set of data
(commonly called a workflow or pipeline). In this lesson
we will make an experiment that takes an application which
runs in parallel and investigate its scalability. To do
that we will need to gather data, in this case that means
running the application multiple times with different
numbers of CPU cores and recording the execution time.
Once we've done that we need to create a visualisation of
the data to see how it compares against the ideal case.
We could do all of this manually, but there are useful
tools to help us manage data analysis pipelines like we
have in our experiment. In the context of this lesson,
we'll learn about one of those: Snakemake.
keywords:
- HPC
- Carpentries
- Lesson
- Workflow
- Pipeline
license: "CC-BY-4.0"
references:
- title: "Getting Started with Snakemake"
authors:
- family-names: Collins
given-names: Daniel
alias: DC23
type: software
repository-code: "https://github.com/carpentries-incubator/workflows-snakemake"
url: "https://carpentries-incubator.github.io/workflows-snakemake/"
- title: "Snakemake for Bioinformatics"
authors:
- family-names: Booth
given-names: Tim
alias: tbooth
orcid: "https://orcid.org/0000-0003-2470-9519"
type: software
repository-code: "https://github.com/carpentries-incubator/snakemake-novice-bioinformatics/"
url: "https://carpentries-incubator.github.io/snakemake-novice-bioinformatics"
GitHub Events
Total
- Issues event: 1
- Delete event: 3
- Issue comment event: 3
- Push event: 39
- Pull request review event: 2
- Pull request event: 6
- Create event: 4
Last Year
- Issues event: 1
- Delete event: 3
- Issue comment event: 3
- Push event: 39
- Pull request review event: 2
- Pull request event: 6
- Create event: 4
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 11
- Total pull requests: 20
- Average time to close issues: 7 months
- Average time to close pull requests: 8 days
- Total issue authors: 6
- Total pull request authors: 4
- Average comments per issue: 1.27
- Average comments per pull request: 1.75
- Merged pull requests: 18
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 13
- Average time to close issues: about 1 hour
- Average time to close pull requests: 11 days
- Issue authors: 3
- Pull request authors: 2
- Average comments per issue: 1.5
- Average comments per pull request: 1.92
- Merged pull requests: 11
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ocaisa (4)
- tkphd (3)
- cgross95 (1)
- guyer (1)
- reid-a (1)
- tobyhodges (1)
Pull Request Authors
- tkphd (18)
- ocaisa (10)
- reid-a (3)