https://github.com/cellgeni/farm-course
Materials for farm-course organised by cellgenIT
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.9%) to scientific vocabulary
Repository
Materials for farm-course organised by cellgenIT
Basic Info
- Host: GitHub
- Owner: cellgeni
- Language: Shell
- Default Branch: master
- Size: 2.1 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
FARM Course - LSF Job Scheduling and HPC Tutorial
This repository contains tutorial materials and example scripts for learning how to use the FARM cluster with LSF (Load Sharing Facility) job scheduling system.
Overview
This course covers:
- Basic LSF job submission with bsub
- Array job processing
- Working with modules and Singularity containers
- Python script execution in HPC environments
- iRODS data management
Repository Structure
farm-course/
├── README.md # This file
├── slides.pdf # Presentation slides
├── data/
│ ├── sample_table.csv # Sample data for array jobs
│ └── sample.list # Sample list file
├── scripts/
│ ├── script.sh # Basic bash script example
│ ├── array_script.bsub # LSF array job script
│ ├── python_script.bsub # Python execution with modules/Singularity
│ ├── add_values.py # Simple Python calculator
│ └── command_list.sh # Collection of useful commands
├── logs/ # Job log files (stdout/stderr)
│ ├── arrayOutput*.log # Array job outputs
│ ├── arrayError*.log # Array job errors
│ ├── output*.log # Standard job outputs
│ └── error*.log # Standard job errors
└── results/ # Output files from job executions
├── lolkek.txt
└── sample*.txt # Generated by array jobs
Getting Started
Prerequisites
- Access to FARM cluster
- Basic knowledge of bash scripting
- Familiarity with Python (optional)
Basic Job Submission
Simple job submission:
bash bsub -G farm-course -q normal -n 1 -M "2G" -R "select[mem>2G] rusage[mem=2G]" -o "output%J.log" -e "error%J.log" ./scripts/script.shArray job submission:
bash bsub -J "[1-6]" < scripts/array_script.bsubPython script with modules:
bash bsub < scripts/python_script.bsub
Script Examples
1. Basic Bash Script (script.sh)
Demonstrates: - File output redirection - stdout vs stderr output - Basic job execution
2. Array Job Script (array_script.bsub)
Demonstrates:
- LSF array job directives
- Reading sample data from CSV
- Job indexing with LSB_JOBINDEX
- Dynamic file creation based on job index
3. Python Calculator (add_values.py)
A simple Python script that: - Accepts command line arguments - Performs basic arithmetic - Includes error handling
4. Module and Singularity Usage (python_script.bsub)
Shows how to: - Load Python modules - Execute Python scripts with different environments - Use Singularity containers for reproducible environments
Key LSF Directives
| Directive | Description |
|-----------|-------------|
| #BSUB -G | Specify user group |
| #BSUB -q | Queue selection |
| #BSUB -n | Number of cores |
| #BSUB -M | Memory limit |
| #BSUB -R | Resource requirements |
| #BSUB -o | Standard output file |
| #BSUB -e | Standard error file |
| #BSUB -J | Job array specification |
Working with Data
Sample Data Format
The data/sample_table.csv contains a simple list of sample names:
sample1
sample2
sample3
sample4
sample5
sample6
Array Job Processing
Array jobs automatically process each sample using the LSB_JOBINDEX variable:
- Job index 1 processes sample1
- Job index 2 processes sample2
- And so on...
Module System
Load available modules:
bash
module avail -C python
module load ISG/python/3.12.3
module load cellgen/singularity
iRODS Commands
Basic iRODS operations covered: ```bash
List catalogs
ils /Sanger1/training
Check metadata
imeta ls -d /path/to/file
Download data
iget -Kv /path/to/remote/file
Query by metadata
imeta qu -z /seq -d sample = "sample_id" ```
Output Files
All job outputs are stored in the results/ directory:
- lolkek.txt - Output from basic job
- sample*.txt - Outputs from array jobs
- Log files with naming pattern: output<JOBID>.log, error<JOBID>.log
Useful Commands
Monitor your jobs:
bash
bjobs # List your jobs
bqueues # Show available queues
bhist # Job history
bkill <jobid> # Kill a job
Troubleshooting
Common Issues
"command not found" errors: Use absolute paths for scripts ```bash
Instead of: script.sh
Use: ./scripts/script.sh
```
Permission denied: Make scripts executable
bash chmod +x scripts/*.shArray indexing: Remember that
LSB_JOBINDEXstarts from 1, but array indices start from 0bash sample_index=$((LSB_JOBINDEX - 1))
Learning Objectives
By the end of this course, you should be able to: - Submit basic and array jobs to LSF - Understand job resource requirements and queues - Use modules and Singularity for software management - Handle file I/O and error logging - Work with iRODS for data management - Debug common job submission issues
Additional Resources
- LSF Documentation: Check cluster-specific documentation
- Singularity User Guide: For containerized applications
- iRODS Documentation: For data management workflows
This tutorial is designed for the FARM cluster environment and may need adaptation for other HPC systems.
Owner
- Name: Cellular Genetics Informatics
- Login: cellgeni
- Kind: organization
- Location: United Kingdom
- Website: https://www.sanger.ac.uk/science/groups/cellular-genetics-informatics
- Repositories: 19
- Profile: https://github.com/cellgeni
Wellcome Sanger Institute
GitHub Events
Total
- Release event: 1
- Push event: 1
- Create event: 2
Last Year
- Release event: 1
- Push event: 1
- Create event: 2