https://github.com/biont-training/hpc-workflows-en

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.2%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: BioNT-Training
License: other
Language: Python
Default Branch: main
Homepage: https://biont-training.github.io/hpc-workflows-en/
Size: 6.01 MB

Statistics

Stars: 0
Watchers: 5
Forks: 0
Open Issues: 0
Releases: 0

Created 11 months ago · Last pushed 7 months ago

Metadata Files

Readme Contributing License Code of conduct Citation

Tame Your Workflow with Snakemake

In HPC Intro, learners explored the scheduler on their cluster by launching a program called amdahl. The objective of this lesson is to adapt the manual job submission process into a repeatable, reusable workflow with minimal human intervention. This is accomplished using Snakemake, a modern workflow engine.

If you are interested in learning more about workflow tools, please visit The Workflows Community.

Snakemake is best for single-node jobs

NERSC's Snakemake docs lists Snakemake's "cluster mode" as a disadvantage, since it submits each "rule" as a separate job, thereby spamming the scheduler with dependent tasks. The main Snakemake process also resides on the login node until all jobs have finished, occupying some resources.

If you wish to adapt your Python-based program for multi-node cluster execution, consider applying the workflow principles learned from this lesson to the Parsl framework. Again, NERSC's Parsl docs provide helpful tips.

Contributing

This is a translation of the old HPC Workflows lesson using The Carpentries Workbench and R Markdown (Rmd). You are cordially invited to contribute! Please check the list of issues if you're unsure where to start.

Building Locally

If you edit the lesson, it is important to verify that the changes are rendered properly in the online version. The best way to do this is to build the lesson locally. You will need an R environment to do this: as described in the {sandpaper} docs, the environment can be either your terminal or RStudio.

Setup

The environment.yml file describes a Conda virtual environment that includes R, Snakemake, amdahl, pandoc, and termplotlib: the tools you'll need to develop and run this lesson, as well as some depencencies. To prepare the environment, install Miniconda following the official instructions. Then open a shell application and create a new environment:

shell you@yours:~$ cd path/to/local/hpc-workflows you@yours:hpc-workflows$ conda env create -f environment.yaml

N.B.: the environment will be named "workflows" by default. If you prefer another name, add -n «alternate_name» to the command.

{sandpaper}

{sandpaper} is the engine behind The Carpentries Workbench lesson layout and static website generator. It is an R package, and has not yet been installed. Paraphrasing the installation instructions, start R or radian, then install:

shell you@yours:hpc-workflows$ R --no-restore --no-save

R install.packages(c("sandpaper", "varnish", "pegboard", "tinkr"), repos = c("https://carpentries.r-universe.dev/", getOption("repos")))

Now you can render the site! From your R session,

R library("sandpaper") sandpaper::serve()

This should output something like the following:

plain Output created: hpc-workflows/site/docs/index.html To stop the server, run servr::daemon_stop(1) or restart your R session Serving the directory hpc-workflows/site/docs at http://127.0.0.1:4321

Click on the link to http://127.0.0.1:4321 or copy and paste it in your browser. You should see any changes you've made to the lesson on the corresponding page(s). If it looks right, you're set to proceed!

Owner

Name: BioNT Training
Login: BioNT-Training
Kind: organization
Email: contact@biont-training.eu

Website: https://biont-training.eu/
Twitter: biont_training
Repositories: 1
Profile: https://github.com/BioNT-Training

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
---
cff-version: 1.2.0
title: "HPC Workflow Management with Snakemake"
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Alan
    family-names: O'Cais
    email: "alan.ocais@cecam.org"
    affiliation: "University of Barcelona"
    orcid: "https://orcid.org/0000-0002-8254-8752"
    alias: ocaisa
  - given-names: Andrew
    family-names: Reid
    email: "andrew.reid@nist.gov"
    affiliation: "National Institute of Standards and Technology"
    orcid: "https://orcid.org/0000-0002-1564-5640"
    alias: reid-a
  - given-names: Annajiat
    family-names: Alim Rasel
    email: "annajiat@bracu.ac.bd"
    affiliation: "Brac University"
    orcid: "https://orcid.org/0000-0003-0198-3734"
    alias: annajiat
  - given-names: Benson
    family-names: Muite
    email: "benson_muite@emailplus.org"
    affiliation: "Kichakato Kizito"
    alias: bkmgit
  - given-names: Trevor
    family-names: Keller
    email: "trevor.keller@nist.gov"
    affiliation: "National Institute of Standards and Technology"
    orcid: "https://orcid.org/0000-0002-2920-8302"
    alias: tkphd
  - given-names: Wirawan
    family-names: Purwanto
    email: "wpurwant@odu.edu"
    affiliation: "Old Dominion University"
    orcid: "https://orcid.org/0000-0002-2124-4552"
    alias: wirawan0

repository-code: "https://github.com/carpentries-incubator/hpc-workflows"
url: "https://carpentries-incubator.github.io/hpc-workflows/"
abstract: >-
  When using HPC resources, it's very common to need to
  carry out the same set of tasks over a set of data
  (commonly called a workflow or pipeline). In this lesson
  we will make an experiment that takes an application which
  runs in parallel and investigate its scalability. To do
  that we will need to gather data, in this case that means
  running the application multiple times with different
  numbers of CPU cores and recording the execution time.
  Once we've done that we need to create a visualisation of
  the data to see how it compares against the ideal case.

  We could do all of this manually, but there are useful
  tools to help us manage data analysis pipelines like we
  have in our experiment. In the context of this lesson,
  we'll learn about one of those: Snakemake.
keywords:
  - HPC
  - Carpentries
  - Lesson
  - Workflow
  - Pipeline
license: "CC-BY-4.0"
references:
  - title: "Getting Started with Snakemake"
    authors:
      - family-names: Collins
        given-names: Daniel
        alias: DC23
    type: software
    repository-code: "https://github.com/carpentries-incubator/workflows-snakemake"
    url: "https://carpentries-incubator.github.io/workflows-snakemake/"
  - title: "Snakemake for Bioinformatics"
    authors:
      - family-names: Booth
        given-names: Tim
        alias: tbooth
        orcid: "https://orcid.org/0000-0003-2470-9519"
    type: software
    repository-code: "https://github.com/carpentries-incubator/snakemake-novice-bioinformatics/"
    url: "https://carpentries-incubator.github.io/snakemake-novice-bioinformatics"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science