https://github.com/cellgeni/farm-course

Materials for farm-course organised by cellgenIT

https://github.com/cellgeni/farm-course

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.9%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Materials for farm-course organised by cellgenIT

Basic Info
  • Host: GitHub
  • Owner: cellgeni
  • Language: Shell
  • Default Branch: master
  • Size: 2.1 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created 11 months ago · Last pushed 11 months ago
Metadata Files
Readme

README.md

FARM Course - LSF Job Scheduling and HPC Tutorial

This repository contains tutorial materials and example scripts for learning how to use the FARM cluster with LSF (Load Sharing Facility) job scheduling system.

Overview

This course covers: - Basic LSF job submission with bsub - Array job processing - Working with modules and Singularity containers - Python script execution in HPC environments - iRODS data management

Repository Structure

farm-course/ ├── README.md # This file ├── slides.pdf # Presentation slides ├── data/ │ ├── sample_table.csv # Sample data for array jobs │ └── sample.list # Sample list file ├── scripts/ │ ├── script.sh # Basic bash script example │ ├── array_script.bsub # LSF array job script │ ├── python_script.bsub # Python execution with modules/Singularity │ ├── add_values.py # Simple Python calculator │ └── command_list.sh # Collection of useful commands ├── logs/ # Job log files (stdout/stderr) │ ├── arrayOutput*.log # Array job outputs │ ├── arrayError*.log # Array job errors │ ├── output*.log # Standard job outputs │ └── error*.log # Standard job errors └── results/ # Output files from job executions ├── lolkek.txt └── sample*.txt # Generated by array jobs

Getting Started

Prerequisites

  • Access to FARM cluster
  • Basic knowledge of bash scripting
  • Familiarity with Python (optional)

Basic Job Submission

  1. Simple job submission: bash bsub -G farm-course -q normal -n 1 -M "2G" -R "select[mem>2G] rusage[mem=2G]" -o "output%J.log" -e "error%J.log" ./scripts/script.sh

  2. Array job submission: bash bsub -J "[1-6]" < scripts/array_script.bsub

  3. Python script with modules: bash bsub < scripts/python_script.bsub

Script Examples

1. Basic Bash Script (script.sh)

Demonstrates: - File output redirection - stdout vs stderr output - Basic job execution

2. Array Job Script (array_script.bsub)

Demonstrates: - LSF array job directives - Reading sample data from CSV - Job indexing with LSB_JOBINDEX - Dynamic file creation based on job index

3. Python Calculator (add_values.py)

A simple Python script that: - Accepts command line arguments - Performs basic arithmetic - Includes error handling

4. Module and Singularity Usage (python_script.bsub)

Shows how to: - Load Python modules - Execute Python scripts with different environments - Use Singularity containers for reproducible environments

Key LSF Directives

| Directive | Description | |-----------|-------------| | #BSUB -G | Specify user group | | #BSUB -q | Queue selection | | #BSUB -n | Number of cores | | #BSUB -M | Memory limit | | #BSUB -R | Resource requirements | | #BSUB -o | Standard output file | | #BSUB -e | Standard error file | | #BSUB -J | Job array specification |

Working with Data

Sample Data Format

The data/sample_table.csv contains a simple list of sample names: sample1 sample2 sample3 sample4 sample5 sample6

Array Job Processing

Array jobs automatically process each sample using the LSB_JOBINDEX variable: - Job index 1 processes sample1 - Job index 2 processes sample2 - And so on...

Module System

Load available modules: bash module avail -C python module load ISG/python/3.12.3 module load cellgen/singularity

iRODS Commands

Basic iRODS operations covered: ```bash

List catalogs

ils /Sanger1/training

Check metadata

imeta ls -d /path/to/file

Download data

iget -Kv /path/to/remote/file

Query by metadata

imeta qu -z /seq -d sample = "sample_id" ```

Output Files

All job outputs are stored in the results/ directory: - lolkek.txt - Output from basic job - sample*.txt - Outputs from array jobs - Log files with naming pattern: output<JOBID>.log, error<JOBID>.log

Useful Commands

Monitor your jobs: bash bjobs # List your jobs bqueues # Show available queues bhist # Job history bkill <jobid> # Kill a job

Troubleshooting

Common Issues

  1. "command not found" errors: Use absolute paths for scripts ```bash

    Instead of: script.sh

    Use: ./scripts/script.sh

    ```

  2. Permission denied: Make scripts executable bash chmod +x scripts/*.sh

  3. Array indexing: Remember that LSB_JOBINDEX starts from 1, but array indices start from 0 bash sample_index=$((LSB_JOBINDEX - 1))

Learning Objectives

By the end of this course, you should be able to: - Submit basic and array jobs to LSF - Understand job resource requirements and queues - Use modules and Singularity for software management - Handle file I/O and error logging - Work with iRODS for data management - Debug common job submission issues

Additional Resources


This tutorial is designed for the FARM cluster environment and may need adaptation for other HPC systems.

Owner

  • Name: Cellular Genetics Informatics
  • Login: cellgeni
  • Kind: organization
  • Location: United Kingdom

Wellcome Sanger Institute

GitHub Events

Total
  • Release event: 1
  • Push event: 1
  • Create event: 2
Last Year
  • Release event: 1
  • Push event: 1
  • Create event: 2