https://github.com/bayer-group/bayerclaw

BayerCLAW workflow orchestration system for AWS

https://github.com/bayer-group/bayerclaw

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.8%) to scientific vocabulary

Keywords

aws bayer-not-classified bayer-reg-none beat-not-applicable pipeline workflow
Last synced: 5 months ago · JSON representation

Repository

BayerCLAW workflow orchestration system for AWS

Basic Info
  • Host: GitHub
  • Owner: Bayer-Group
  • License: bsd-3-clause
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.59 MB
Statistics
  • Stars: 22
  • Watchers: 3
  • Forks: 9
  • Open Issues: 1
  • Releases: 21
Topics
aws bayer-not-classified bayer-reg-none beat-not-applicable pipeline workflow
Created almost 5 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License

README.md

Bayer CLoud Automated Workflows (BayerCLAW)

BayerCLAW is a workflow orchestration system targeted at bioinformatics pipelines. A workflow consists of a sequence of computational steps, each of which is captured in a Docker container. Some steps may parallelize work across many executions of the same container (scatter/gather pattern).

A workflow is described in a YAML file. The BayerCLAW compiler uses AWS CloudFormation to transform the workflow description into AWS resources used by the workflow. This includes an AWS StepFunctions state machine that represents the sequence of steps in the workflow.

A workflow typically takes several parameters, such as sample IDs or paths to input files. Once the workflow definition has been deployed, the workflow can be executed by copying a JSON file with the execution parameters to a "launcher" S3 bucket, which is constructed by BayerCLAW. The workflow state machine uses AWS Batch to actually run the Docker containers, in the proper order.

Documentation

The doc/ directory of this repo contains all the pages linked above.

Key components of BayerCLAW

The workflow definition

The BayerCLAW workflow template is a JSON- or YAML-formatted file describing the processing steps of the pipeline. Here is an example of a very simple, one-step workflow:

```YAML Transform: BC2_Compiler

Repository: s3://example-bucket/hello-world/${job.SAMPLE_ID}

Steps: - hello: image: docker.io/library/ubuntu commands: - echo "Hello world! This is job ${job.SAMPLE_ID}!" ```

The repository

The repository is a path within an S3 bucket where a given workflow stores its output files, such as s3://generic-workflow-bucket/my-workflow-repo/. The repo is typically parameterized with some job-specific unique ID, so that each execution of the workflow is kept separate. For example, s3://generic-workflow-bucket/my-workflow-repo/job12345/

Job data file

The job data file contains data needed for a single pipeline execution. This data must be encoded as a flat JSON object with string keys and string values. Even integer or float values should be quoted as strings.

Copying the job data file into the launcher bucket will trigger an execution of the pipeline. Overwriting the job data file, even with the same contents, will trigger another execution.

Sample job data file

json5 { "SAMPLE_ID": "ABC123", "READS1": "s3://workflow-bucket/inputs/reads1.fq", "READS2": "s3://workflow-bucket/inputs/reads2.fq" }

Owner

  • Name: Bayer Open Source
  • Login: Bayer-Group
  • Kind: organization

Science for a better life

GitHub Events

Total
  • Release event: 1
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 1
  • Push event: 51
  • Pull request event: 5
  • Fork event: 1
  • Create event: 2
Last Year
  • Release event: 1
  • Watch event: 1
  • Delete event: 1
  • Issue comment event: 1
  • Push event: 51
  • Pull request event: 5
  • Fork event: 1
  • Create event: 2

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 698
  • Total Committers: 3
  • Avg Commits per committer: 232.667
  • Development Distribution Score (DDS): 0.368
Past Year
  • Commits: 111
  • Committers: 1
  • Avg Commits per committer: 111.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
jetaba j****a@m****m 441
jetaba j****a@b****m 253
Clifford Wollam c****m@b****m 4
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 3
  • Total pull requests: 18
  • Average time to close issues: 3 months
  • Average time to close pull requests: 2 days
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 0.67
  • Average comments per pull request: 0.06
  • Merged pull requests: 17
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ByrCWollam (3)
Pull Request Authors
  • jack-e-tabaska (20)
  • ByrCWollam (3)
  • ivanmilevtues (1)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

bclaw_runner/requirements.txt pypi
  • backoff *
  • boto3 *
  • docker *
  • docopt *
  • jmespath *
  • more_itertools *
  • pytest *
  • requests *
lambda/src/chooser/requirements.txt pypi
  • dotted *
lambda/src/compiler/requirements.txt pypi
  • humanfriendly *
  • pyyaml *
  • voluptuous *
lambda/src/notifications/requirements.txt pypi
  • pyyaml *
lambda/src/scatter/requirements.txt pypi
  • jsonpath *
  • pyyaml *
lambda/tests/requirements.txt pypi
  • boto3 ==1.21.18
  • dotted *
  • humanfriendly *
  • jmespath *
  • jsonpath *
  • moto ==3.1.3
  • pytest *
  • pyyaml *
  • voluptuous *
bclaw_runner/Dockerfile docker
  • base latest build
  • public.ecr.aws/docker/library/python 3.9.9-slim-bullseye build
lambda/src/job_launcher/Dockerfile docker
  • base latest build
  • public.ecr.aws/lambda/python 3.9 build
lambda/src/gather/requirements.txt pypi
lambda/src/initializer/requirements.txt pypi
  • jmespath *
lambda/src/job_launcher/requirements.txt pypi
  • boto3 ==1.21.18
lambda/src/qc_checker/requirements.txt pypi
lambda/src/subpipes/requirements.txt pypi