qbatch

https://github.com/cobralab/qbatch

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: CoBrALab
License: unlicense
Language: Python
Default Branch: master
Size: 209 KB

Statistics

Stars: 33
Watchers: 3
Forks: 13
Open Issues: 30
Releases: 17

Created over 10 years ago · Last pushed 7 months ago

Metadata Files

Readme License Citation

qbatch

Execute shell command lines in parallel on Slurm, S(on) of Grid Engine (SGE), PBS/Torque clusters

qbatch is a tool for executing commands in parallel across a compute cluster. It takes as input a list of commands (shell command lines or executable scripts) in a file or piped to qbatch. The list of commands are divided into arbitrarily sized chunks which are submitted as jobs to the cluster either as individual submissions or an array. Each job runs the commands in its chunk in parallel according to cores. Commands can also be run locally on systems with no cluster capability via gnu-paralel.

qbatch can also be used within python using the qbatch.qbatchParser and qbatch.qbatchDriver functions. qbatchParser will accept a list of command line options identical to the shell interface, parse, and submit jobs. The qbatchDriver interface will accept key-value pairs corresponding to the outputs of the argument parser, and additionally, the task_list option, providing a list of strings of commands to run.

Installation

sh $ pip install qbatch

Dependencies

qbatch requires python (>2.7) and GNU Parallel. For Torque/PBS and gridengine clusters, qbatch requires the qsub and qstat commands. For Slurm workload manager, qbatch requires the sbatch and squeue commands.

Environment variable defaults

qbatch supports several environment variables to customize defaults for your local system.

sh $ export QBATCH_PPJ=12 # requested processors per job $ export QBATCH_CHUNKSIZE=$QBATCH_PPJ # commands to run per job $ export QBATCH_CORES=$QBATCH_PPJ # commonds to run in parallel per job $ export QBATCH_NODES=1 # number of compute nodes to request for the job, typically for MPI jobs $ export QBATCH_MEM="0" # requested memory per job $ export QBATCH_MEMVARS="mem" # memory request variable to set $ export QBATCH_SYSTEM="pbs" # queuing system to use ("pbs", "sge","slurm", or "local") $ export QBATCH_NODES=1 # (PBS-only) nodes to request per job $ export QBATCH_SGE_PE="smp" # (SGE-only) parallel environment name $ export QBATCH_QUEUE="1day" # Name of submission queue $ export QBATCH_OPTIONS="" # Arbitrary cluster options to embed in all jobs $ export QBATCH_SCRIPT_FOLDER=".qbatch/" # Location to generate jobfiles for submission $ export QBATCH_SHELL="/bin/sh" # Shell to use to evaluate jobfile

Command line help

``` usage: qbatch [-h] [-w WALLTIME] [-c CHUNKSIZE] [-j CORES] [--ppj PPJ] [-N JOBNAME] [--mem MEM] [-q QUEUE] [-n] [-v] [--version] [--depend DEPEND] [-d WORKDIR] [--logdir LOGDIR] [-o OPTIONS] [--header HEADER] [--footer FOOTER] [--nodes NODES] [--sge-pe SGE_PE] [--memvars MEMVARS] [--pbs-nodes-spec PBSNODESSPEC] [-i] [-b {pbs,sge,slurm,local,container}] [--env {copied,batch,none}] [--shell SHELL] ...

Submits a list of commands to a queueing system. The list of commands can be broken up into 'chunks' when submitted, so that the commands in each chunk run in parallel (using GNU parallel). The job script(s) generated by qbatch are stored in the folder .qbatch/

positional arguments: command_file An input file containing a list of shell commands to be submitted, - to read the command list from stdin or -- followed by a single command

optional arguments: -h, --help show this help message and exit -w WALLTIME, --walltime WALLTIME Maximum walltime for an array job element or individual job (default: None) -c CHUNKSIZE, --chunksize CHUNKSIZE Number of commands from the command list that are wrapped into each job (default: 1) -j CORES, --cores CORES Number of commands each job runs in parallel. If the chunk size (-c) is smaller than -j then only chunk size commands will run in parallel. This option can also be expressed as a percentage (e.g. 100%) of the total available cores (default: 1) --ppj PPJ Requested number of processors per job (aka ppn on PBS, slots on SGE, cpus per task on SLURM). Cores can be over subscribed if -j is larger than --ppj (useful to make use of hyper-threading on some systems) (default: 1) -N JOBNAME, --jobname JOBNAME Set job name (defaults to name of command file, or STDIN) (default: None) --mem MEM Memory required for each job (e.g. --mem 1G). This value will be set on each variable specified in --memvars. To not set any memory requirement, set this to 0 (default: 0) -q QUEUE, --queue QUEUE Name of queue to submit jobs to (defaults to no queue) (default: None) -n, --dryrun Dry run; Create jobfiles but do not submit or run any commands (default: False) -v, --verbose Verbose output (default: False) --version show program's version number and exit

advanced options: --depend DEPEND Wait for successful completion of job(s) with name matching given glob pattern or job id matching given job id(s) before starting (default: None) -d WORKDIR, --workdir WORKDIR Job working directory (default: current working directory) --logdir LOGDIR Directory to save store log files (default: {workdir}/logs) -o OPTIONS, --options OPTIONS Custom options passed directly to the queuing system (e.g --options "-l vf=8G". This option can be given multiple times (default: []) --header HEADER A line to insert verbatim at the start of the script, and will be run once per job. This option can be given multiple times (default: None) --footer FOOTER A line to insert verbatim at the end of the script, and will be run once per job. This option can be given multiple times (default: None) --nodes NODES (PBS and SLURM only) Nodes to request per job (default: 1) --sge-pe SGEPE (SGE-only) The parallel environment to use if more than one processor per job is requested (default: smp) --memvars MEMVARS A comma-separated list of variables to set with the memory limit given by the --mem option (e.g. --memvars=hvmem,vf) (default: mem) --pbs-nodes-spec PBSNODESSPEC (PBS-only) String to be inserted into nodes= line of job (default: None) -i, --individual Submit individual jobs instead of an array job (default: False) -b {pbs,sge,slurm,local,container}, --system {pbs,sge,slurm,local,container} The type of queueing system to use. 'pbs' and 'sge' both make calls to qsub to submit jobs. 'slurm' calls sbatch. 'local' runs the entire command list (without chunking) locally. 'container' creates a joblist and metadata file, to pass commands out of a container to a monitoring process for submission to a batch system. (default: local) --env {copied,batch,none} Determines how your environment is propagated when your job runs. "copied" records your environment settings in the job submission script, "batch" uses the cluster's mechanism for propagating your environment, and "none" does not propagate any environment variables. (default: copied) --shell SHELL Shell to use for spawning jobs and launching single commands (default: /bin/sh) ```

Some examples:

```sh

Submit an array job from a list of commands (one per line)

Generates a job script in ./.qbatch/ and job logs appear in ./logs/\

All defaults are inherited from QBATCH_* environment variables

$ qbatch commands.txt

Submit a single command to the cluster

$ qbatch -- echo hello

Set the walltime for each job

$ qbatch -w 3:00:00 commands.txt

Run 24 commands per job

$ qbatch -c24 commands.txt

Pack 24 commands per job, run 12 in parallel at a time

$ qbatch -c24 -j12 commands.txt

Start jobs after successful completion of existing jobs with names starting with "stage1_"

$ qbatch --afterok 'stage1_*' commands.txt

Pipe a list of commands to qbatch

$ parallel echo process.sh {} ::: *.dat | qbatch -

Run jobs locally with GNU Parallel, 12 commands in parallel

$ qbatch -b local -j12 commands.txt

Many options don't make sense locally: chunking, individual vs array, nodes,

ppj, highmem, and afterok are ignored

```

A python script example: ```python

Submit jobs to a cluster using the QBATCH_* environment defaults

import qbatch tasklist = ['echo hello', 'echo hello2'] qbatch.qbatchDriver(tasklist = task_list)

```

Owner

Name: Computational Brain Anatomy Laboratory
Login: CoBrALab
Kind: organization
Email: contact@cobralab.ca
Location: Montreal, QC

Website: http://cobralab.ca
Repositories: 36
Profile: https://github.com/CoBrALab

Computational Brain Anatomy Laboratory, located in the CIC at the Douglas Institute, McGill University

Citation (CITATION.cff)

# YAML 1.2
---
cff-version: "1.1.0"
abstract: |
    "Execute shell command lines in parallel on Slurm, S(un|on of) Grid Engine (SGE) and PBS/Torque clusters"
message: "If you use this software, please cite it as below."
authors:
  - 
    family-names: Devenyi
    given-names: "Gabriel Allan"
    orcid: "https://orcid.org/0000-0002-7766-1187"
  -
    family-names: Pipitone
    given-names: Jon
    orcid: "https://orcid.org/0000-0001-6313-5701"
title: qbatch
repository-code: "https://github.com/pipitone/qbatch"
version: "2.2"
date-released: 2020-03-12

GitHub Events

Total

Create event: 1
Release event: 1
Issues event: 1
Watch event: 5
Issue comment event: 4
Push event: 4
Pull request event: 1

Last Year

Create event: 1
Release event: 1
Issues event: 1
Watch event: 5
Issue comment event: 4
Push event: 4
Pull request event: 1

Dependencies

requirements-testing.txt pypi

future * test
nose >=1.0 test
ushlex * test

requirements.txt pypi

future *

setup.py pypi

future *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

qbatch

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

qbatch

Installation

Dependencies

Environment variable defaults

Command line help

Some examples:

Submit an array job from a list of commands (one per line)

Generates a job script in ./.qbatch/ and job logs appear in ./logs/\

All defaults are inherited from QBATCH_* environment variables

Submit a single command to the cluster

Set the walltime for each job

Run 24 commands per job

Pack 24 commands per job, run 12 in parallel at a time

Start jobs after successful completion of existing jobs with names starting with "stage1_"

Pipe a list of commands to qbatch

Run jobs locally with GNU Parallel, 12 commands in parallel

Many options don't make sense locally: chunking, individual vs array, nodes,

ppj, highmem, and afterok are ignored

Submit jobs to a cluster using the QBATCH_* environment defaults

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Dependencies