hotsub

hotsub: A batch job engine for cloud services with ETL framework - Published in JOSS (2018)

https://github.com/otiai10/hotsub

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

aws batch-job bioinformatics cwl cwl-workflow docker docker-machine etl-framework gcp wdl wdl-workflow workflow workflow-engine
Last synced: 6 months ago · JSON representation

Repository

Command line tool to run batch jobs concurrently with ETL framework on AWS or other cloud computing resources

Basic Info
  • Host: GitHub
  • Owner: otiai10
  • License: gpl-3.0
  • Language: Go
  • Default Branch: master
  • Homepage: https://hotsub.github.io/
  • Size: 283 KB
Statistics
  • Stars: 30
  • Watchers: 6
  • Forks: 5
  • Open Issues: 11
  • Releases: 19
Topics
aws batch-job bioinformatics cwl cwl-workflow docker docker-machine etl-framework gcp wdl wdl-workflow workflow workflow-engine
Created about 8 years ago · Last pushed over 7 years ago
Metadata Files
Readme Contributing License

README.md

hotsub Build Status Paper Status

The simple batch job driver on AWS and GCP. (Azure, OpenStack are coming soon)

sh hotsub run \ --script ./star-alignment.sh \ --tasks ./star-alignment-tasks.csv \ --image friend1ws/star-alignment \ --aws-ec2-instance-type t2.2xlarge \ --verbose

It will

  • execute workflow described in star-alignment.sh
  • for each samples specified in star-alignment.csv
  • in friend1ws/star-alignment docker containers
  • on EC2 instances of type t2.2xlarge

and automatically upload the output files to S3 and clean up EC2 instances after all.

See Documentation for more details.

Why you use hotsub

There are 3 points why hotsub is made and why you use it

  1. No-need to setup your cloud on web consoles:
    • Since hotsub uses pure EC2 or GCE instances, you don't have to configure AWS Batch nor Dataflow on messy web consoles
  2. Multi-platforms with the same interface of command line:
    • You can switch AWS and GCP as you like only with --provider option of run command (of course you need to have credentials on your local machine)
  3. ExTL framework available:
    • In some cases of bio-informatics, the problem is how to handle common and huge refrence genome. hotsub suggests and implements ExTL framework.

Installation

Check Getting Started on GitHub Pages

Commands

```sh NAME: hotsub - command line to run batch computing on AWS and GCP with the same interface

USAGE: hotsub [global options] command [command options] [arguments...]

VERSION: 0.10.0

DESCRIPTION: Open-source command-line tool to run batch computing tasks and workflows on backend services such as Amazon Web Services.

COMMANDS: run Run your jobs on cloud with specified input files and any parameters init Initialize CLI environment on which hotsub runs template Create a template project of hotsub help, h Shows a list of commands or help for one command

GLOBAL OPTIONS: --help, -h show help --version, -V print the version ```

Available options for run command

```sh % hotsub run -h NAME: hotsub run - Run your jobs on cloud with specified input files and any parameters

USAGE: hotsub run [command options] [arguments...]

DESCRIPTION: Run your jobs on cloud with specified input files and any parameters

OPTIONS: --verbose, -v Print verbose log for operation. --log-dir value Path to log directory where stdout/stderr log files will be placed (default: "${cwd}/logs/${time}") --concurrency value, -C value Throttle concurrency number for running jobs (default: 8) --provider value, -p value Job service provider, either of aws, gcp, vbox, hyperv --tasks value Path to CSV of task parameters, expected to specify --env, --input, --input-recursive and --output-recursive. (required) --image value Image name from Docker Hub or other Docker image service. (default: "ubuntu:14.04") --script value Local path to a script to run inside the workflow Docker container. (required) --shared value, -S value Shared data URL on cloud storage bucket. (e.g. s3://~) --keep Keep instances created for computing event after everything gets done --env value, -E value Environment variables to pass to all the workflow containers --disk-size value Size of data disk to attach for each job in GB. (default: 64) --shareddata-disksize value Disk size of shared data instance (in GB) (default: 64) --aws-region value AWS region name in which AmazonEC2 instances would be launched (default: "ap-northeast-1") --aws-ec2-instance-type value AWS EC2 instance type. If specified, all --min-cores and --min-ram would be ignored. (default: "t2.micro") --aws-shared-instance-type value Shared Instance Type on AWS (default: "m4.4xlarge") --aws-vpc-id value VPC ID on which computing VMs are launched --aws-subnet-id value Subnet ID in which computing VMs are launched --google-project value Project ID for GCP --google-zone value GCP service zone name (default: "asia-northeast1-a") --cwl value CWL file to run your workflow --cwl-job value Parameter files for CWL --wdl value WDL file to run your workflow --wdl-job value Parameter files for WDL --include value Local files to be included onto workflow container ```

Contact

To make it transparent, ask any question from this link.

https://github.com/otiai10/hotsub/issues

Owner

  • Name: Hiromu OCHIAI
  • Login: otiai10
  • Kind: user
  • Location: Tokyo, Japan
  • Company: @ayanel, @triax

🙋 ❤️ 🍣

JOSS Publication

hotsub: A batch job engine for cloud services with ETL framework
Published
November 13, 2018
Volume 3, Issue 31, Page 1069
Authors
Hiromu Ochiai ORCID
National Cancer Center Research Institute, Tokyo, Japan
Kenichi Chiba
National Cancer Center Research Institute, Tokyo, Japan
Ai Okada ORCID
National Cancer Center Research Institute, Tokyo, Japan
Yuichi Shiraishi ORCID
National Cancer Center Research Institute, Tokyo, Japan
Editor
Roman Valls Guimera ORCID
Tags
cloud Docker AWS GCP ETL

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 270
  • Total Committers: 3
  • Avg Commits per committer: 90.0
  • Development Distribution Score (DDS): 0.007
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Hiromu OCHIAI o****0@g****m 268
Yuichi Shiraishi f****s@g****m 1
Ai Okada a****a@n****p 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 75
  • Total pull requests: 25
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 4 days
  • Total issue authors: 5
  • Total pull request authors: 3
  • Average comments per issue: 1.27
  • Average comments per pull request: 0.32
  • Merged pull requests: 25
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • otiai10 (60)
  • friend1ws (7)
  • ken0-1n (6)
  • aokad (1)
  • kozo2 (1)
Pull Request Authors
  • otiai10 (23)
  • aokad (1)
  • friend1ws (1)
Top Labels
Issue Labels
enhancement (47) feature request! (12) bug (9) discussion needed (6) suggestion (5) wontfix (5) question (3)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 21
proxy.golang.org: github.com/otiai10/hotsub
  • Versions: 21
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 7.0%
Average: 8.2%
Dependent repos count: 9.3%
Last synced: 6 months ago