chtc-containers
This repos contains information how to build, test, and run containers on CHTC. It contains examples of definition files for several bioinformatics softwares.
https://github.com/uw-madison-bacteriology-bioinformatics/chtc-containers
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.1%) to scientific vocabulary
Repository
This repos contains information how to build, test, and run containers on CHTC. It contains examples of definition files for several bioinformatics softwares.
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
chtc-containers
CHTC (the Center for High Throughput Computing) (https://chtc.cs.wisc.edu/) is a computing resource with High Throughput Computing (HTC) and High Performance Computing (HPC) options available to UW-Madison researchers. Please visit their website for more information. Most researchers in our department will have access to HTC (the default) if creating an account with them.
Containers are used to install and run software on CHTC.
What you will need
To run a typical bioinformatics workflow on CHTC you will need:
- A definition file: instructions to build a Apptainer container for software
- A sif file: an Apptainer container containing the software
- An executable script: a bash script with lines that CHTC will run
- A submit script: a text file containing information about resources requested, and that will refer to the SIF file you build
You do NOT need to create a .sif image file everytime you run a job, you can reuse the same images over time.
Instructions
Installing software: creating a container for your bioinformatics software
Create a container image using the .def file using these instructions: https://chtc.cs.wisc.edu/uw-research-computing/apptainer-htc.html
If you have previously used conda to install bioinformatics software, the process usually involves creating an 'environment' in your /home/username folder, then packing the conda environment into a tar.gz file, and writing in the tar.gz file inside of the executable script (.sh) when running a job on CHTC.
The website anaconda.org is where we can search for software to install. Bioconda (https://anaconda.org/bioconda/repo) is the name of a conda channel that contains many bioinformatics software, ready to install (over 10,000 of them to be exact!).
On CHTC, we need to use Containers, such as Docker, Apptainer/Singularity images to install and run any software.
One challenge is that the repository for pre-built installable container images (e.g. Docker: hub.docker.com) is much less than on conda.
[!NOTE] Conda: a way to install software along with all its dependencies, but is specific to different computer architectures (e.g. Mac, Windows, Linux). Container: A way to install softawre along with all its dependencies, IN ADDITION to the installation being built on a specific computer architecture.
Thankfully, it only takes a few steps to convert a conda environment into a container image, by creating a .def recipe file to build an Apptainer container by using the following template:
``` Bootstrap: docker From: continuumio/miniconda3:latest
%post conda install -c conda-forge -c bioconda bowtie fastx_toolkit ```
The line Bootstrap tells you that this is a Docker image.
The From line is so that the container can be built by using miniconda. You don't have to install miniconda3 yourself, because an image exists here: https://hub.docker.com/r/continuumio/miniconda3
The lines after %post are the conda install instructions that you would have typed into your terminal.
Note that you do not need to write conda create or conda activate in this .def file.
This is often the line that you would edit to obtain your software of your choice.
[!NOTE] On this GitHub repository under
recipe,I have created.deffiles of common bioinformatics software that can be used to create.sifcontainer images to be used within CHTC. You can use them to build your own .sif images, OR use the template above and replace what follows the%postline with your own tool you want to install from conda.
Starting an interactive job to create a container:
Once you have a .def file somewhere in your /home/username directory on chtc, create a build.sub file, that will be used to start an interactive job condor_submit -i build.sub:
In the code below, change the line transfer_input_files = image.def to correspond to the name of your image.def file (e.g. spades.def, fastqc.def, etc.)
```
build.sub
For building an Apptainer container
universe = vanilla log = build.log
In the latest version of HTCondor on CHTC, interactive jobs require an executable.
If you do not have an existing executable, use a generic linux command like hostname as shown below.
executable = /usr/bin/hostname
If you have additional files in your /home directory that are required for your container, add them to the transferinputfiles line as a comma-separated list.
transferinputfiles = image.def
requirements = (HasCHTCStaging == true)
+IsBuildJob = true requestcpus = 4 requestmemory = 16GB request_disk = 16GB
queue ```
then type the following to submit your interactive job:
condor_submit -i build.sub
Follow the instructions on : https://chtc.cs.wisc.edu/uw-research-computing/apptainer-htc.html#start-an-interactive-build-job
In summary, the commands to type once you enter the interactive job are:
```
Build the container using the instructions in the def file, write it to the file name of your choice with the extension .sif
apptainer build
Test the container:
apptainer shell -e
the prompt will change to Apptainer>
Check installation by typing -h (or other ways to access the program) next to the program name
fastqc -h
Once you saw that it works, exit the container:
exit
Move the .sif file to your staging folder:
mv
exit the interactive Build job
exit ``` Make sure you exit from the interactive job before moving on to the next step.
Getting Ready to Submit your actual Job
Create a submit file for your job
This is a DIFFERENT .sub file than in the previous step. Here, the submit file contains the resources requested to run the actual computational job.
You will likely need to set custom cpus, memory and disk space.
Here is an example of the sub file to be used to actually use the container in a job on CHTC:
```
apptainer.sub
Provide HTCondor with the name of your .sif file and universe information
container_image = file:///staging/path/to/my-container.sif
executable = myExecutable.sh
Include other files that need to be transferred here.
transferinputfiles = otherjobfiles
log = job.log error = job.err output = job.out
requirements = (HasCHTCStaging == true) && (OpSysMajorVer > 7) && (HasChtcProjects == true)
Make sure you request enough disk for the container image in addition to your other input files
requestcpus = 1 requestmemory = 4GB request_disk = 10GB
queue ```
You will need to change the line container_image to correspond to the file path of the .sif file you created the previous step.
Writing your executable script for CHTC (.sh)
In this script, include the usual shebang line (#!/bin/bash) followed by the code to run your software for example:
```
!/bin/bash
fastqc -h
fastqc /staging/ptran5/raw_data/*.fastq ```
In this .sh file, you do not need to activate the conda environment anymore.
In this example, we are running the program FASTQC on all the fastq samples in the folder /staging/ptran5/raw_data/.
We can do this because /staging is accessible from the working nodes, therefore we can call it directly in the executable script.
Submit your job
You can submit your job using condor_submit <file.sub> as usual, making sure that you are using the condor_submit with the -i flag to interactively test your code. If you do test your code interactively, you will need to type conda activate (no need to specify any environment name)
Alternative: How to build an Apptainer container from an existing conda environment
1) First activate your environment, and export it as a yml file:
Let's say I have an environment called checkm2
conda activate $ENVNAME
conda env export > $ENVNAME.yml
conda deactivate
ls
If you were to open your yml file, it might look something like this: https://github.com/UW-Madison-Bacteriology-Bioinformatics/chtc-containers/blob/main/recipes/from_yml/checkm2.yml
2) Send the yml file to another computer, another laptop, or in this case, your CHTC home folder.
3) On CHTC, create a definition file (recipe file) that looks like this: https://github.com/UW-Madison-Bacteriology-Bioinformatics/chtc-containers/blob/main/recipes/fromyml/checkm2.def
You can copy and paste this into CHTC, and replace the yml file for your own file name.
4) The steps to build your container using `condorsubmit -iare going to be very similar to what we've seen previously, but make sure to include theymlfile in thetransferinputfiles` line like this:
``` [ptran5@ap2002 apptainer_def]$ cat build.sub
build.sub
For building an Apptainer container
universe = vanilla log = build.log
In the latest version of HTCondor on CHTC, interactive jobs require an executable.
If you do not have an existing executable, use a generic linux command like hostname as shown below.
executable = /usr/bin/hostname
If you have additional files in your /home directory that are required for your container, add them to the transferinputfiles line as a comma-separated list.
transferinputfiles = checkm2.yml, checkm2.def
requirements = (HasCHTCStaging == true)
+IsBuildJob = true requestcpus = 4 requestmemory = 16GB request_disk = 16GB
queue ```
5) From there, submit your condor job interactively and use the Apptainer build commands to build your container (.sif file)
Owner
- Name: UW-Madison-Bacteriology-Bioinformatics
- Login: UW-Madison-Bacteriology-Bioinformatics
- Kind: organization
- Repositories: 1
- Profile: https://github.com/UW-Madison-Bacteriology-Bioinformatics
Citation (citation.CFF)
cff-version: 1.0.0 message: "If you use this guide, please cite it as below." authors: - family-names: "Tran" given-names: "Patricia Q." orcid: "https://orcid.org/0000-0003-3948-3938" title: "How to build software containers to use on CHTC" version: 1.0.0 doi: date-released: 2024-11-20 url: "https://github.com/UW-Madison-Bacteriology-Bioinformatics/chtc-containers"
GitHub Events
Total
- Push event: 10
Last Year
- Push event: 10