chtc-containers

This repos contains information how to build, test, and run containers on CHTC. It contains examples of definition files for several bioinformatics softwares.

https://github.com/uw-madison-bacteriology-bioinformatics/chtc-containers

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

This repos contains information how to build, test, and run containers on CHTC. It contains examples of definition files for several bioinformatics softwares.

Basic Info
  • Host: GitHub
  • Owner: UW-Madison-Bacteriology-Bioinformatics
  • Default Branch: main
  • Homepage:
  • Size: 77.1 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme Citation

README.md

chtc-containers

CHTC (the Center for High Throughput Computing) (https://chtc.cs.wisc.edu/) is a computing resource with High Throughput Computing (HTC) and High Performance Computing (HPC) options available to UW-Madison researchers. Please visit their website for more information. Most researchers in our department will have access to HTC (the default) if creating an account with them.

Containers are used to install and run software on CHTC.

What you will need

To run a typical bioinformatics workflow on CHTC you will need:

  1. A definition file: instructions to build a Apptainer container for software
  2. A sif file: an Apptainer container containing the software
  3. An executable script: a bash script with lines that CHTC will run
  4. A submit script: a text file containing information about resources requested, and that will refer to the SIF file you build

You do NOT need to create a .sif image file everytime you run a job, you can reuse the same images over time.

Instructions

Installing software: creating a container for your bioinformatics software

Create a container image using the .def file using these instructions: https://chtc.cs.wisc.edu/uw-research-computing/apptainer-htc.html

If you have previously used conda to install bioinformatics software, the process usually involves creating an 'environment' in your /home/username folder, then packing the conda environment into a tar.gz file, and writing in the tar.gz file inside of the executable script (.sh) when running a job on CHTC.

The website anaconda.org is where we can search for software to install. Bioconda (https://anaconda.org/bioconda/repo) is the name of a conda channel that contains many bioinformatics software, ready to install (over 10,000 of them to be exact!). On CHTC, we need to use Containers, such as Docker, Apptainer/Singularity images to install and run any software. One challenge is that the repository for pre-built installable container images (e.g. Docker: hub.docker.com) is much less than on conda.

[!NOTE] Conda: a way to install software along with all its dependencies, but is specific to different computer architectures (e.g. Mac, Windows, Linux). Container: A way to install softawre along with all its dependencies, IN ADDITION to the installation being built on a specific computer architecture.

Thankfully, it only takes a few steps to convert a conda environment into a container image, by creating a .def recipe file to build an Apptainer container by using the following template:

``` Bootstrap: docker From: continuumio/miniconda3:latest

%post conda install -c conda-forge -c bioconda bowtie fastx_toolkit ```

The line Bootstrap tells you that this is a Docker image.

The From line is so that the container can be built by using miniconda. You don't have to install miniconda3 yourself, because an image exists here: https://hub.docker.com/r/continuumio/miniconda3

The lines after %post are the conda install instructions that you would have typed into your terminal. Note that you do not need to write conda create or conda activate in this .def file. This is often the line that you would edit to obtain your software of your choice.

[!NOTE] On this GitHub repository under recipe,I have created .def files of common bioinformatics software that can be used to create .sif container images to be used within CHTC. You can use them to build your own .sif images, OR use the template above and replace what follows the %post line with your own tool you want to install from conda.

Starting an interactive job to create a container:

Once you have a .def file somewhere in your /home/username directory on chtc, create a build.sub file, that will be used to start an interactive job condor_submit -i build.sub: In the code below, change the line transfer_input_files = image.def to correspond to the name of your image.def file (e.g. spades.def, fastqc.def, etc.)

```

build.sub

For building an Apptainer container

universe = vanilla log = build.log

In the latest version of HTCondor on CHTC, interactive jobs require an executable.

If you do not have an existing executable, use a generic linux command like hostname as shown below.

executable = /usr/bin/hostname

If you have additional files in your /home directory that are required for your container, add them to the transferinputfiles line as a comma-separated list.

transferinputfiles = image.def

requirements = (HasCHTCStaging == true)

+IsBuildJob = true requestcpus = 4 requestmemory = 16GB request_disk = 16GB

queue ```

then type the following to submit your interactive job: condor_submit -i build.sub

Follow the instructions on : https://chtc.cs.wisc.edu/uw-research-computing/apptainer-htc.html#start-an-interactive-build-job

In summary, the commands to type once you enter the interactive job are:

```

Build the container using the instructions in the def file, write it to the file name of your choice with the extension .sif

apptainer build

Test the container:

apptainer shell -e

the prompt will change to Apptainer>

Check installation by typing -h (or other ways to access the program) next to the program name

fastqc -h

Once you saw that it works, exit the container:

exit

Move the .sif file to your staging folder:

mv /staging/netid/.

exit the interactive Build job

exit ``` Make sure you exit from the interactive job before moving on to the next step.

Getting Ready to Submit your actual Job

Create a submit file for your job

This is a DIFFERENT .sub file than in the previous step. Here, the submit file contains the resources requested to run the actual computational job. You will likely need to set custom cpus, memory and disk space.

Here is an example of the sub file to be used to actually use the container in a job on CHTC:

```

apptainer.sub

Provide HTCondor with the name of your .sif file and universe information

container_image = file:///staging/path/to/my-container.sif

executable = myExecutable.sh

Include other files that need to be transferred here.

transferinputfiles = otherjobfiles

log = job.log error = job.err output = job.out

requirements = (HasCHTCStaging == true) && (OpSysMajorVer > 7) && (HasChtcProjects == true)

Make sure you request enough disk for the container image in addition to your other input files

requestcpus = 1 requestmemory = 4GB request_disk = 10GB

queue ```

You will need to change the line container_image to correspond to the file path of the .sif file you created the previous step.

Writing your executable script for CHTC (.sh)

In this script, include the usual shebang line (#!/bin/bash) followed by the code to run your software for example:

```

!/bin/bash

fastqc -h

fastqc /staging/ptran5/raw_data/*.fastq ```

In this .sh file, you do not need to activate the conda environment anymore. In this example, we are running the program FASTQC on all the fastq samples in the folder /staging/ptran5/raw_data/. We can do this because /staging is accessible from the working nodes, therefore we can call it directly in the executable script.

Submit your job

You can submit your job using condor_submit <file.sub> as usual, making sure that you are using the file that uses the .sif file in the container line. If you are using condor_submit with the -i flag to interactively test your code. If you do test your code interactively, you will need to type conda activate (no need to specify any environment name)

Alternative: How to build an Apptainer container from an existing conda environment

1) First activate your environment, and export it as a yml file: Let's say I have an environment called checkm2

conda activate $ENVNAME conda env export > $ENVNAME.yml conda deactivate ls If you were to open your yml file, it might look something like this: https://github.com/UW-Madison-Bacteriology-Bioinformatics/chtc-containers/blob/main/recipes/from_yml/checkm2.yml

2) Send the yml file to another computer, another laptop, or in this case, your CHTC home folder. 3) On CHTC, create a definition file (recipe file) that looks like this: https://github.com/UW-Madison-Bacteriology-Bioinformatics/chtc-containers/blob/main/recipes/fromyml/checkm2.def You can copy and paste this into CHTC, and replace the yml file for your own file name. 4) The steps to build your container using `condorsubmit -iare going to be very similar to what we've seen previously, but make sure to include theymlfile in thetransferinputfiles` line like this:

``` [ptran5@ap2002 apptainer_def]$ cat build.sub

build.sub

For building an Apptainer container

universe = vanilla log = build.log

In the latest version of HTCondor on CHTC, interactive jobs require an executable.

If you do not have an existing executable, use a generic linux command like hostname as shown below.

executable = /usr/bin/hostname

If you have additional files in your /home directory that are required for your container, add them to the transferinputfiles line as a comma-separated list.

transferinputfiles = checkm2.yml, checkm2.def

requirements = (HasCHTCStaging == true)

+IsBuildJob = true requestcpus = 4 requestmemory = 16GB request_disk = 16GB

queue ```

5) From there, submit your condor job interactively and use the Apptainer build commands to build your container (.sif file)

Owner

  • Name: UW-Madison-Bacteriology-Bioinformatics
  • Login: UW-Madison-Bacteriology-Bioinformatics
  • Kind: organization

Citation (citation.CFF)

cff-version: 1.0.0
message: "If you use this guide, please cite it as below."
authors:
- family-names: "Tran"
  given-names: "Patricia Q."
  orcid: "https://orcid.org/0000-0003-3948-3938"
title: "How to build software containers to use on CHTC"
version: 1.0.0
doi: 
date-released: 2024-11-20
url: "https://github.com/UW-Madison-Bacteriology-Bioinformatics/chtc-containers"

GitHub Events

Total
  • Push event: 10
Last Year
  • Push event: 10