https://github.com/broadinstitute/warp-tools

This repository contains all containers that WARP uses.

https://github.com/broadinstitute/warp-tools

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.4%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

This repository contains all containers that WARP uses.

Basic Info
  • Host: GitHub
  • Owner: broadinstitute
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: develop
  • Size: 38.2 MB
Statistics
  • Stars: 10
  • Watchers: 6
  • Forks: 2
  • Open Issues: 3
  • Releases: 1
Created over 3 years ago · Last pushed 10 months ago
Metadata Files
Readme License

README.md

WARP-Tools

This repository has the container that hosts all the scripts and tools that WARP uses.

The project structure is straightforward and contains essentially just two types of directories: a tool directory, containing all our in-house tools, and a 3rd-party-tools directory which hosts all the third-party containers we use in our pipelines. Each directory contains it's own README that describes the tool or scripts, along with a usage guide.

.github/workflows

This contains all YML files for automated container builds.


Docker Style Guide

This style guide provides formatting guidelines and best practices for writing Dockerfiles in WARP.

:book: Table of Contents

WARP maintains a collection of docker images which are used as execution environments for various cloud-optimized data processing pipelines. Many of these image require specific sets of tools and dependencies to run and can be thought of as custom images rather than traditional application images. Building and maintaining these images can be challenging; this document provides a set of guidelines to assist in developing docker images for WARP.

Goals

The following are some goals/guidelines we want to strive for when writing our Dockerfiles.

Small images

Building a smaller image offers advantages such as faster upload and download times along with reduced storage costs and minimized attack vector. Two of the easiest ways to minimize the size of your image is to use a small base image and to reduce the number of layers in your image.

Alpine base

The easiest way to have a small image is to use an Alpine base image. Alpine linux, compared to Debian, RHEL etc., is designed specifically for security and resource efficiency, and lends itself perfectly to be used as a building block for Docker images.

Along with being a small base, Alpine also has built in deletion of package index and provides tini natively through APK.

There are some instances where a Debian base image is unavoidable, specifically in the case where dependencies don't exist in APK. It is suggested that you only go to a Debian base as a last resort.

:eyes: Example

```dockerfile

OKAY, NOT GREAT - uses debian

FROM python:debian

RUN set -eux; \ apt-get update; \ apt-get install -y \ curl \ bash \ ; \

Must clean up cache manually with Debian

apt-get clean && rm -rf /var/lib/apt/list/*

GOOD - uses alpine

FROM alpine:3.9

RUN set -eux; \ apk add --no-cache \ curl \ bash \ ```

Specifying image platform

Docker images built on ARM-based machines such as the new M-series Macs may run into execution issues with our automated PR test suite. One way to avoid these issues is to use a linux/amd64 base image by including the --platform="linux/amd64 flag after the FROM keyword.

:eyes: Example

```dockerfile

Use the amd64 version of alpine

FROM --platform="linux/amd64" alpine ```

Minimal RUN steps

Having minimal RUNsteps (ideally one) is another highly effective way to reduce the size of your image. Each instruction in a Dockerfile creates a layer and these layers are what add up to build the final image. When you use multiple RUN steps it creates additional unnecessary layers and bloats your image.

An alternative to having a single RUN step is to use multi-stage builds which are effective when the application you are containerizing is just a statically linked binary. Just to note, many of the images maintained in WARP require a handful of system-level dependencies and custom packages so multi-stages builds are typically not used.

:eyes: Example

```dockerfile

BAD - uses multiple RUN steps

RUN set -eux RUN apk add --no-cache curl bash wget RUN wget https://www.somezipfile.com/zip RUN unzip zip

GOOD - uses single RUN step

RUN set -eux; \ apk add --no-cache \ curl \ bash \ ; \ wget https://www.somezipfile.com/zip; \ unzip zip ```

Publicly accessible

The pipelines that we maintain in WARP are designed for public use, ideally we would like our docker images to be publicly available as well. This would mean the following conditions must be true.

  • Anybody can pull our images
  • Anybody can build our images

For anybody to be able to pull our images they must be hosted on a public container registry, we host all of our images in public repos on GCR (our 'official' location) and Quay (for discoverability).

  • GCR - us.gcr.io/broad-gotc-prod
  • Quay - quay.io/broadinstitute/broad-gotc-prod

For anybody to be able to build our images, all functionality should be encapsulated in the Dockerfile. Any custom software packages, dependencies etc. have to be downloaded from public links within the Dockerfile, this obviously means that we should not be copying files from within the Broad network infrastructure into our images.

Image scanning

All images that we build are scanned for critical vulnerabilities on every pull request. For this we use a github-action that leverages trivy for scanning. If you build a new image please add it to the action here.

Semantic tagging

We recommend against using rolling tags like master or latest when building images. Rolling tags make it hard to track down versions of images since the underlying image hash and content could be different across the same tags. Instead, we ask that you use a semantic tag that follows the convention below:

us.gcr.io/broad-gotc-prod/samtools:<image-version>-<samtools-version>-<unix-timestamp>

This example is for an image we use containing samtools. The 'image-version' in this case is the traditional major.minor.patch version of the image being built, which is updated when changes to the image (underlying OS, system level packages, etc.) unrelated to samtools are made. The 'samtools-version' here correlates with the specific version of samtools being used, having this information in the tag makes it easy for users to identify and not have to track down. Lastly, a unix timestamp in included to avoid any potential issues with Cromwell image caching.

Proper process reaping

Classic init systems like systemd are used to reap orphaned, zombie processes. Typically, these orphaned processes are reattached to the process at PID 1 which will reap them when they die. In a container this responsibility falls to process at PID 1 which is by default /bin/sh...this obviously will not handle process reaping. Because of this you run the risk of expending excess memory or resources within your container. A simple solution to this is to use tini in all of our images, a lengthy explanation of what this package does can be found here.

Luckily tini is available natively through APK so all you have to do is install it and set it as the default entrypoint!

:eyes: Example

```dockerfile

FROM alpine:3.9

RUN set -eux; \ apk add --no-cache \ tini

ENTRYPOINT ["/sbin/tini" , "--"] ```

Build Scripts and README

To make life easier when building and pushing our images we like to have an easy-to-use docker_build.sh that sits next to each Dockerfile. These scripts should have configurable inputs for the version of tools (samtools, picard, zcall, etc.) being used in the image. Additionally, we like to keep a record of the versions built and being used by writing the images to the accompanying docker_versions.tsv, this should be done automatically by your build script.

For first time users of these images it is helpful to have a README which gives a high-level overview of the image.

See the examples for samtools(docker_build, docker_versions, README)

Formatting

Formatting our Dockerfiles consistently helps improve readability and eases maintenance headaches down the road. The following are a couple of tenants that we follow when writing our Dockerfiles:

  • ARGS, ENV, LABEL in that order
  • Always add versions of tools in the LABEL
  • Single RUN steps
  • Alphabetize package install
  • Clean up package index cache
  • Use ; instead of && for line continuation
  • Logically separate steps within RUN
  • Four spaces per tab indent
  • Short comments to describe each step
  • tini is always default entrypoint

The following is a good example for our verify_bam_id image. This Dockerfile shows how to install packages, install tini and clean up cached index files for a debian base image.

:eyes: Example

```dockerfile

Have to use debian based image, many of the installed packages here are not available in Alpine

FROM us.gcr.io/broad-dsp-gcr-public/base/python:debian

ARG GIT_HASH=c1cba76e979904eb69c31520a0d7f5be63c72253

ENV TERM=xterm-256color \ BAMIDURL=https://github.com/Griffan/VerifyBamID/archive \ TINIVERSION=v0.19.0

LABEL MAINTAINER="Broad Institute DSDE dsde-engineering@broadinstitute.org" \ GITHASH=${GITHASH}

WORKDIR /usr/gitc

Install dependencies

RUN set -eux; \ apt-get update; \ apt-get install -y \ autoconf \ cmake \ g++ \ gcc \ git \ libbz2-dev \ libcurl4-openssl-dev \ libhts-dev \ libssl-dev \ unzip \ wget \ zlib1g-dev \ ; \

Install BamID

wget ${BAMID_URL}/${GIT_HASH}.zip; \
unzip ${GIT_HASH}.zip; \
\
cd VerifyBamID-${GIT_HASH}; \
mkdir build;  \
cd build; \
CC=$(which gcc) CXX=$(which g++) cmake ..; \
\
cmake; \
make; \
make test; \
\
cd ../../; \
mv VerifyBamID-${GIT_HASH}/bin/VerifyBamID .; \
rm -rf ${GIT_HASH}.zip VerifyBamID-${GIT_HASH} \
; \

Install tini

wget https://github.com/krallin/tini/releases/download/$TINI_VERSION/tini -O /sbin/tini; \
chmod +x /sbin/tini \
; \

Clean up cached files

apt-get clean && rm -rf /var/lib/apt/lists/*

Set tini as default entrypoint

ENTRYPOINT [ "/sbin/tini", "--" ] ```

Troubleshooting and running standalone

The WARP dockers are designed to be run from their respective WDL pipelines. However, if you need to run a Docker independent of a WDL for testing or troubleshooting, you'll likely need to explicity instruct it to run a bash shell in the run command. An example of this is shown in the terminal command below:

bash docker run -it --rm <docker url> bash

If you have any questions or would like some more guidance on writing Dockerfiles please file a GitHub issue in WARP.

Citing WARP and WARP-Tools

When citing WARP and WARP-Tools, please use the following:

Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1

Owner

  • Name: Broad Institute
  • Login: broadinstitute
  • Kind: organization
  • Location: Cambridge, MA

Broad Institute of MIT and Harvard

GitHub Events

Total
  • Watch event: 1
  • Delete event: 6
  • Member event: 3
  • Issue comment event: 5
  • Push event: 100
  • Pull request event: 49
  • Pull request review event: 41
  • Pull request review comment event: 24
  • Create event: 24
Last Year
  • Watch event: 1
  • Delete event: 6
  • Member event: 3
  • Issue comment event: 5
  • Push event: 100
  • Pull request event: 49
  • Pull request review event: 41
  • Pull request review comment event: 24
  • Create event: 24

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 3
  • Total pull requests: 158
  • Average time to close issues: 20 days
  • Average time to close pull requests: 11 days
  • Total issue authors: 1
  • Total pull request authors: 13
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.19
  • Merged pull requests: 110
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 0
  • Pull requests: 24
  • Average time to close issues: N/A
  • Average time to close pull requests: 5 days
  • Issue authors: 0
  • Pull request authors: 5
  • Average comments per issue: 0
  • Average comments per pull request: 0.08
  • Merged pull requests: 11
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • fredlas (3)
Pull Request Authors
  • ekiernan (38)
  • aawdeh (33)
  • nikellepetrillo (29)
  • kevinpalis (24)
  • fredlas (16)
  • khajoue2 (8)
  • sahakiann (2)
  • phendriksen100 (2)
  • dependabot[bot] (2)
  • jessicaway (1)
  • rsc3 (1)
  • kayleemathews (1)
  • mmorgantaylor (1)
Top Labels
Issue Labels
Pull Request Labels
do not merge (3) dependencies (2)

Dependencies

.github/workflows/build-arrays-picard-private.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-atac-barcodes.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-bcftools-vcftools.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-build-indices.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-bwa.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-cutadapt.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-dragmap.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-ea-utils.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-eagle.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-emptydrops.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-fastp.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-fgbio.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-gatk.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-hisat2.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-illumina-iaap-autocall.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-m3c-yap-hisat.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-minimac4.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-optimus-test-matrix.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-picard-python.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-rsem.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-samtools-bwa-mem-2.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-samtools-bwa.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-samtools-picard-bwa.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-samtools-star.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-samtools.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-snapatac2.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-snaptools-bwa.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-star.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-subread.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-umi-tools.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-vcftoallc.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-verify-bam-id.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/build-warp-tools-inverse.yml actions
.github/workflows/build-warp-tools.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • docker/login-action v2 composite
.github/workflows/build-zcall.yml actions
  • actions/checkout v3 composite
  • docker/login-action v2 composite
.github/workflows/trivy.yml actions
  • actions/checkout v3 composite
  • broadinstitute/dsp-appsec-trivy-action v1 composite
3rd-party-tools/arrays-picard-private/Dockerfile docker
  • adoptopenjdk/openjdk8 debian-slim build
3rd-party-tools/atac-barcodes/Dockerfile docker
  • python 3.9.2 build
3rd-party-tools/bcftools-vcftools/Dockerfile docker
  • python 3.8-alpine build
3rd-party-tools/build-indices/Dockerfile docker
  • python 3.6-bullseye build
3rd-party-tools/bwa/Dockerfile docker
  • ubuntu 16.04 build
3rd-party-tools/cutadapt/Dockerfile docker
  • python 3.7.7 build
3rd-party-tools/dragmap/Dockerfile docker
  • rockylinux 8.5 build
3rd-party-tools/ea-utils/Dockerfile docker
  • ubuntu 16.04 build
3rd-party-tools/eagle/Dockerfile docker
  • us.gcr.io/broad-dsp-gcr-public/base/python debian build
3rd-party-tools/emptydrops/Dockerfile docker
  • ubuntu 18.04 build
3rd-party-tools/build-indices/requirements.txt pypi
  • Cython ==0.24.1
  • black ==19.3b0
  • flake8 ==3.7.7
  • pysam ==0.16.0.1
  • pytest ==5.1.1
  • pytest-cov ==2.10.1
3rd-party-tools/fastp/Dockerfile docker
  • debian bullseye-slim build
3rd-party-tools/fgbio/Dockerfile docker
  • adoptopenjdk/openjdk8 alpine-slim build
3rd-party-tools/gatk/Dockerfile docker
  • adoptopenjdk/openjdk8 alpine-slim build
3rd-party-tools/hisat2/Dockerfile docker
  • ubuntu 16.04 build
3rd-party-tools/illumina-iaap-autocall/Dockerfile docker
  • frolvlad/alpine-mono 5.4-glibc build
3rd-party-tools/m3c-yap-hisat/Dockerfile docker
  • mambaorg/micromamba 0.23.0 build
3rd-party-tools/minimac4/Dockerfile docker
  • us.gcr.io/broad-gotc-prod/imputation-bcf-vcf 1.0.6-1.10.2-0.1.16-1663946207 build
3rd-party-tools/optimus-test-matrix/Dockerfile docker
  • ubuntu 18.04 build
3rd-party-tools/picard-python/Dockerfile docker
  • python 3.8-alpine build
3rd-party-tools/rsem/Dockerfile docker
  • ubuntu 16.04 build
3rd-party-tools/samtools/Dockerfile docker
  • alpine 3.8 build
3rd-party-tools/samtools-bwa/Dockerfile docker
  • us.gcr.io/broad-gotc-prod/samtools 1.0.0-1.11-1624651616 build
3rd-party-tools/samtools-bwa-mem-2/Dockerfile docker
  • us.gcr.io/broad-gotc-prod/samtools 1.0.0-1.11-1624651616 build
3rd-party-tools/samtools-bwa-mem-2-lisa/Dockerfile docker
  • us.gcr.io/broad-gotc-prod/samtools 2.0.0 build
3rd-party-tools/samtools-picard-bwa/Dockerfile docker
  • us.gcr.io/broad-gotc-prod/samtools 1.0.0-1.11-1624651616 build
3rd-party-tools/samtools-star/Dockerfile docker
  • us.gcr.io/broad-gotc-prod/samtools 1.0.0-1.11-1624651616 build
3rd-party-tools/snapatac2/Dockerfile docker
  • python 3.9 build
3rd-party-tools/snaptools-bwa/Dockerfile docker
  • python 3.8 build
3rd-party-tools/star/Dockerfile docker
  • alpine latest build
3rd-party-tools/subread/Dockerfile docker
  • python 3.6.15-buster build
3rd-party-tools/umi-tools/Dockerfile docker
  • us.gcr.io/broad-dsp-gcr-public/base/python 3.9-debian build
3rd-party-tools/vcftoallc/Dockerfile docker
  • python 3.7.2 build
3rd-party-tools/verify-bam-id/Dockerfile docker
  • us.gcr.io/broad-dsp-gcr-public/base/python debian build
3rd-party-tools/zcall/Dockerfile docker
  • python 2.7.18-alpine3.11 build
tools/Dockerfile docker
  • python 3.10.12-buster build
3rd-party-tools/subread/requirements.txt pypi
  • Cython ==0.24.1
  • black ==19.3b0
  • flake8 ==3.7.7
  • pysam ==0.16.0.1
  • pytest ==5.1.1
  • pytest-cov ==2.10.1
tools/scripts/requirements.txt pypi
  • h5py ==2.10.0
  • loompy ==3.0.6
  • numpy *
  • scipy *