snakemake-learning

GitHub Repository for the snakemake learn session at the @MannLabs Group Retreat 2025

https://github.com/lucas-diedrich/snakemake-learning

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.8%) to scientific vocabulary

Keywords

tutorial
Last synced: 4 months ago · JSON representation ·

Repository

GitHub Repository for the snakemake learn session at the @MannLabs Group Retreat 2025

Basic Info
  • Host: GitHub
  • Owner: lucas-diedrich
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.05 MB
Statistics
  • Stars: 1
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Topics
tutorial
Created 7 months ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

snakemake-learning

GitHub Repository for the hands-on snakemake learn session at the MannLabs Group Retreat 2025

Snakemake is a python-based workflow manager that is supposed to make your life easier when analysing large datasets. It enforces reproducibility and enables scalability.

Tutorial overview

In this tutorial, we will 1. read in a dataset (here: a small image) 2. process it with a simple function (here: apply different image transformations to it) 3. generate a plot as output (here: histograms of pixel intensities) 4. generate a snakemake report.

Results

Installation

  1. Using the command line, go into your favorite directory (cd /path/to/my/favorite/directory)

  2. Clone this repository

shell git clone https://github.com/lucas-diedrich/snakemake-learning.git

(or download it via Code > Download ZIP, and unzip it locally)

  1. Go into the directory

shell cd snakemake-learning

  1. Create a mamba/conda environment with snakemake based on the environment.yaml file and activate it

```shell mamba create -n snakemake-env --file environment.yaml && mamba activate snakemake-env

OR conda env create -f environment.yaml && conda activate snakemake-env

```

  1. Check if the installation was successful

```shell snakemake --version

9.5.1 ```

Tutorial

1. Snakemake - Introduction

See the slides in ./docs

2. Check out the workflow

Run the following command in the root directory (.) to se the whole task graph.

```shell

--dag: Directed acyclic graph

snakemake --dag ```

And the following command to inspect how the rules depend on one another (simpler than task graph, especially for large workflows)

```shell

--rulegraph: Show dependencies between rules

snakemake --rulegraph ```

```mermaid

title: Rule Graph

flowchart TB id0[all] id1[plothistogram] id2[transformimage] id3[save_image] style id0 fill:#CD5C5C,stroke-width:2px,color:#333333 style id1 fill:#F08080,stroke-width:2px,color:#333333 style id2 fill:#FA8072,stroke-width:2px,color:#333333 style id3 fill:#E9967A,stroke-width:2px,color:#333333 id0 --> id0 id1 --> id0 id2 --> id1 id3 --> id2 ```

You can use this grapviz visualizer editor to view the task graph

3. Run the full workflow

Go in the ./workflow directory and run:

shell snakemake --cores 2 --use-conda

The output can be found in the ./results directory

Generate the report

Go in the ./workflow directory and run

shell snakemake --report ../results/report.html

The output can be found in the ./results directory

Run on a slurm HPC cluster

You can run this workflow on an high-performance computing cluster (here leveraging the slurm manager). In this case, one slurm job acts as a scheduler that submits individual rule executions as separate slurm jobs. The snakemake-executor-plugin-slurm automatically handles the scheduling and submission of dependent jobs. Please checkout the script /workflow/snakemake.sbatch and the official snakemake slurm plugin documentation to learn more about the relevant flags and settings.

Execution

Install the environment

conda create -n snakemake-env -y conda env update --n snakemake-env --file environment.yaml

Additionally install the snakemake-executor-plugin-slurm:

shell pip install snakemake-executor-plugin-slurm

Then submit the provided workflow script on a cluster

shell cd /workflow/ sbatch snakemake.sbatch

Exercises

To further deepen your understanding after the workshop.

1. Scale the workflow to other images

The script create-data.py can take image names (that are part of the skimage package) as arguments.

shell python scripts/create-data.py --image-name <image name> --output <output name> Modify the workflow in a way that it also (=in addition) runs on other skimage example datasets, e.g. colorwheel, cat, logo

2. Add a rule

Add a new rule in which you generate an aggregated plot - where the image and its modifications are shown in the top row and the associated histograms are shown in the bottom row.

3. Prettify the report

Explore possibilities to modify the report with the rich structured text format.

References

  • Snakemake homepage + Documentation snakemake.readthedocs.io

  • Publication Mölder F, Jablonski KP, Letcher B et al. Sustainable data analysis with Snakemake [version 2; peer review: 2 approved]. F1000Research 2021, 10:33 (https://doi.org/10.12688/f1000research.29032.2)

Owner

  • Name: Lucas Diedrich
  • Login: lucas-diedrich
  • Kind: user

M.Sc. student of biochemistry @HeidelbergUniversity

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Snakemake Tutorial
message: A short introduction to snakemake
type: software
authors:
  - given-names: Lucas
    family-names: Diedrich
    email: diedrich@biochem.mpg.de
    affiliation: 'Max Planck Institute for Biochemistry '
    orcid: 'https://orcid.org/0009-0007-4884-1422'
repository-code: 'https://github.com/lucas-diedrich/snakemake-learning.git'
url: 'https://github.com/lucas-diedrich/snakemake-learning.git'
abstract: Introduction to snakemake
keywords:
  - snakemake
  - tutorial
license: MIT

GitHub Events

Total
  • Delete event: 4
  • Push event: 23
  • Create event: 5
Last Year
  • Delete event: 4
  • Push event: 23
  • Create event: 5

Dependencies

environment.yaml pypi
  • Pygments ==2.19.1
  • snakemake ==9.5.1
workflow/envs/environment.yaml pypi