workbench
Simple Slurm tool for running interactive GPU workloads on HPC clusters.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary
Repository
Simple Slurm tool for running interactive GPU workloads on HPC clusters.
Basic Info
- Host: GitHub
- Owner: ceharvs
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Size: 266 KB
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Workbench
Approved for Public Release; Distribution Unlimited.
Public Release Case Number 22-0668.
© 2024 The MITRE Corporation. ALL RIGHTS RESERVED.
This files used in this repository cna be deployed as an Ansible role to installs the correct files for workbench on HPC clusters running Slurm.
workbench is a simple wrapper script that supports quick and easy GPU job deployments on HPC Clusters.
Why Workbench
Users need to be able to ... * ... launch a single large simulation :whitecheckmark: * ... launch hundreds of simulaions :whitecheckmark: * ... submit jobs and walk away :whitecheckmark:
They also need... * ... to debug :whitecheckmark: * ... to launch interactive workloads :whitecheckmark: * ... resources now :whitecheckmark: * ... to not learn too many new things :x:
Slurm is a great resource for all points but the last one. New users to Slurm that haven't worked on traditional HPC clusters aren't looking to learn the complicated Slurm syntax to launch jobs interactively.
Interactive Jobs in Slurm
The srun command can submit a job that runs right away:
bash
$ srun hostname
gpu1.cl.cluster.local
It can also launch in an interactive terminal:
bash
$ srun --pty $BASH
[user@gpu1 ~]$
Additional Slurm options can be added to meet user's needs:
bash
$ srun -t60 -c5 --mem=50GB --gpus=1 --pty $BASH
[user@gpu1 ~]$
Why does this Fail?
The above is what people want... but not really - new users (any users really) don't want to memorize all that syntax for something they'll run frequently. * Too many options * Too long * Often users don't even know what this is looking for * Error prone with so much typing
Why does workbench succeed?
workbench works because users need to not learn too many new things. Workbench is simply a wrapper script for Slurm interactive jobs that saves users from keeping too much in their personal memory or notes.
What is workbench?
bash
$ srun -t60 -c5 --mem=50GB --gpus=1 --pty $BASH
[user@gpu1 ~]$
workbench is two scripts that essentially runs the above command but with a few more options. Overall, workbench is made up of about 250 lines of code and uses a Python script and a shell script to launch jobs.

```bash [user@login01 ~]$ workbench
Please wait for your allocation to be created. You have requested: -GPUs: 1 -CPUs: 5 -Memory: 50GB -Time: 4 Hours
Estimated Start Time: 2024-04-25T15:39:04 If the job takes longer than 1-2 minutes to start, check cspan.mitre.org and squeue for resource availability. If utilization is at capacity, the interactive option may be unavailable.
\ \ / / | | | | | | \ \ /\ / /__ _ | | _| | ___ _ __ __| |_ \ \/ \/ / _ | '| |/ / '_ \ / _ \ '_ \ / _| ' \ \ /\ / () | | | <| |) | __/ | | | (| | | | \/ \/ _/|| ||__./ _|| ||_|| ||
Welcome to the WORKBENCH! Your job is running on gpunode5 You have 1 GPU(s) reserved: GPU 0: Tesla V100-SXM2-32GB (UUID: GPU-e0c399a2-2a6f-e687-b704-b0906db8fb71)
Type 'exit' to leave the workbench. Present Working Directory: /home/user
[user@gpunode5 ~]$ hostname gpunode5 ```
Features
--helplets people see the most important options- number of GPUs
- GPU type
- number of CPUs
- Memory
- Time
- Other Slurm options are fully supported
- Jupyter Notebook with port forwarding instructions
- Launch a singularity container
- Show VS Code port forwarding instructions
```bash [user@login01 ~]$ workbench --help usage: workbench [-h] [-g {0,1,2,3,4}] [-k {k80,m40,p100,v100,a100,a40}] [-c CPU_COUNT] [-t WALLCLOCK] [-m MEMORY] [-p PARTITION] [-A ACCOUNT] [--vs_debug] [--jupyter] [--jupyterargs JUPYTERARGS] [--container CONTAINER_NAME] [--bind BIND]
Launch an interactive job via Slurm on the HPC cluster. In addition to the commands listed, any sbatch arguments will also work on the command line. Workbench will only launch jobs on a single node. For more information on sbatch commands: https://slurm.schedmd.com/sbatch.html
optional arguments: -h, --help show this help message and exit -g {0,1,2,3,4}, --gpus {0,1,2,3,4} Number of GPUs requested. -k {k80,m40,p100,v100,a100,a40}, --kind {k80,m40,p100,v100,a100,a40} Type of GPU, use scontrol show nodes to review which gpu types are available. -c CPUCOUNT, --cpus CPUCOUNT Number of CPUs. -t WALLCLOCK, --time WALLCLOCK Expected reservation duration in hours. -m MEMORY, --mem MEMORY Expected memory required for the job. -p PARTITION, --partition PARTITION Partition to run the job on. -A ACCOUNT, --account ACCOUNT Associated project for accounting. --vsdebug Launch an interactive VS Code Debug Session. --jupyter Launch a Jupyter Notebook instance. --jupyterargs JUPYTERARGS Additional arguments for Jupyter. Provide in quotes. Do NOT include port or --no-browser. --container CONTAINERNAME Launch a specific singularity container. Provide the .simg file name. --bind BIND Input for singularity bind command. ```
Requirements
Slurm, Python, and Bash must be already configured on the local system.
Role Variables
N/A
Dependencies
N/A
Example Playbook
yaml
- hosts: my_hosts
gather_facts: true
roles:
- workbench
License
GNU General Public License
Author Information
Christine Harvey, ceharvey@mitre.org
Owner
- Name: Christine Harvey
- Login: ceharvs
- Kind: user
- Website: itsharveytime.com
- Repositories: 19
- Profile: https://github.com/ceharvs
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Harvey" given-names: "Christine" orcid: "https://orcid.org/0000-0002-3941-3895" title: "Workbench" version: 1.0 date-released: 2024-04-01 url: "https://github.com/ceharvs/workbench"
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1