https://github.com/pegasus-isi/pegasus

Pegasus Workflow Management System - Automate, recover, and debug scientific computations.

https://github.com/pegasus-isi/pegasus

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    36 of 50 committers (72.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.2%) to scientific vocabulary

Keywords

bioinformatics distributed-systems hpc workflow workflow-management-system

Keywords from Contributors

batch-job distributed-computing scheduling-simulator simulation-framework simulation-modeling workflow-simulator
Last synced: 5 months ago · JSON representation

Repository

Pegasus Workflow Management System - Automate, recover, and debug scientific computations.

Basic Info
  • Host: GitHub
  • Owner: pegasus-isi
  • License: apache-2.0
  • Language: Java
  • Default Branch: master
  • Homepage: https://pegasus.isi.edu
  • Size: 319 MB
Statistics
  • Stars: 200
  • Watchers: 5
  • Forks: 79
  • Open Issues: 44
  • Releases: 1
Topics
bioinformatics distributed-systems hpc workflow workflow-management-system
Created about 13 years ago · Last pushed 6 months ago
Metadata Files
Readme License

README.md

Pegasus WMS

Pegasus Workflow Management System

Pegasus WMS is a configurable system for mapping and executing scientific workflows over a wide range of computational infrastructures including laptops, campus clusters, supercomputers, grids, and commercial and academic clouds. Pegasus has been used to run workflows with up to 1 million tasks that process tens of terabytes of data at a time.

Pegasus WMS bridges the scientific domain and the execution environment by automatically mapping high-level workflow descriptions onto distributed resources. It automatically locates the necessary input data and computational resources required by a workflow, and plans out all of the required data transfer and job submission operations required to execute the workflow. Pegasus enables scientists to construct workflows in abstract terms without worrying about the details of the underlying execution environment or the particulars of the low-level specifications required by the middleware (Condor, Globus, Amazon EC2, etc.). In the process, Pegasus can plan and optimize the workflow to enable efficient, high-performance execution of large workflows on complex, distributed infrastructures.

Pegasus has a number of features that contribute to its usability and effectiveness:

  • Portability / Reuse – User created workflows can easily be run in different environments without alteration. Pegasus currently runs workflows on top of Condor pools, Grid infrastructures such as Open Science Grid and XSEDE, Amazon EC2, Google Cloud, and HPC clusters. The same workflow can run on a single system or across a heterogeneous set of resources.
  • Performance – The Pegasus mapper can reorder, group, and prioritize tasks in order to increase overall workflow performance.
  • Scalability – Pegasus can easily scale both the size of the workflow, and the resources that the workflow is distributed over. Pegasus runs workflows ranging from just a few computational tasks up to 1 million. The number of resources involved in executing a workflow can scale as needed without any impediments to performance.
  • Provenance – By default, all jobs in Pegasus are launched using the Kickstart wrapper that captures runtime provenance of the job and helps in debugging. Provenance data is collected in a database, and the data can be queried with tools such as pegasus-statistics, pegasus-plots, or directly using SQL.
  • Data Management – Pegasus handles replica selection, data transfers and output registration in data catalogs. These tasks are added to a workflow as auxilliary jobs by the Pegasus planner.
  • Reliability – Jobs and data transfers are automatically retried in case of failures. Debugging tools such as pegasus-analyzer help the user to debug the workflow in case of non-recoverable failures.
  • Error Recovery – When errors occur, Pegasus tries to recover when possible by retrying tasks, by retrying the entire workflow, by providing workflow-level checkpointing, by re-mapping portions of the workflow, by trying alternative data sources for staging data, and, when all else fails, by providing a rescue workflow containing a description of only the work that remains to be done. It cleans up storage as the workflow is executed so that data-intensive workflows have enough space to execute on storage-constrained resources. Pegasus keeps track of what has been done (provenance) including the locations of data used and produced, and which software was used with which parameters.

Getting Started

You can find more information about Pegasus on the Pegasus Website.

Pegasus has an extensive User Guide that documents how to create, plan, and monitor workflows.

We recommend you start by completing the Pegasus Tutorial from Chapter 3 of the Pegasus User Guide.

The easiest way to install Pegasus is to use one of the binary packages available on the Pegasus downloads page. Consult Chapter 2 of the Pegasus User Guide for more information about installing Pegasus from binary packages.

Release notes are also incorporated in the user guide and can be accessed from the Table of Contents below the Reference Guide. The sources for it can be found in ./doc/sphinx/release-notes directory.

There is documentation on the Pegasus website for the Python, Java and R Abstract Workflow Generator APIs. We strongly recommend using the Python API which is feature complete, and also allows you to invoke all the pegasus command line tools.

You can use pegasus-init command line tool to run several examples on your local machine. Consult Chapter 4 of the Pegasus User Guide for more information.

There are also examples of how to Configure Pegasus for Different Execution Environments in the Pegasus User Guide.

If you need help using Pegasus, please contact us. See the contact page on the Pegasus website for more information.

Building from Source

Pegasus can be compiled on any recent Linux or Mac OS X system.

Source Dependencies

In order to build Pegasus from source, make sure you have the following installed:

  • Git
  • Java 8 or higher
  • Python 3.6 or higher
  • Ant
  • gcc
  • g++
  • make
  • tox 3.14.5 or higher
  • mysql (optional, required to access MySQL databases)
  • postgresql (optional, required to access PostgreSQL databases)
  • Python pyyaml
  • Python GitPython

Other packages may be required to run unit tests, and build MPI tools.

Compiling

Ant is used to compile Pegasus.

To get a list of build targets run:

$ ant -p

The targets that begin with "dist" are what you want to use.

To build a basic binary tarball (excluding documentation), run:

$ ant dist

To build the release tarball (including documentation), run:

$ ant dist-release

The resulting packages will be created in the dist subdirectory.

Owner

  • Name: Pegasus Project
  • Login: pegasus-isi
  • Kind: organization
  • Email: pegasus-support@isi.edu
  • Location: Marina del Rey, CA

Workflow Management System -- Automate, recover, and debug scientific computations.

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 13,122
  • Total Committers: 50
  • Avg Commits per committer: 262.44
  • Development Distribution Score (DDS): 0.584
Past Year
  • Commits: 308
  • Committers: 7
  • Avg Commits per committer: 44.0
  • Development Distribution Score (DDS): 0.659
Top Committers
Name Email Commits
Karan Vahi v****i@i****u 5,462
Rajiv Mayani m****i@i****u 1,587
Gideon Juve g****n@i****u 1,290
Mats Rynge m****s@r****t 978
Mats Rynge r****e@i****u 687
Gaurang Mehta g****a@i****u 529
Ryan Tanaka t****a@i****u 414
Ryan Tanaka r****t@h****u 413
Prasanth Thomas p****h@i****u 381
Jens Vöckler v****r@i****u 316
Rafael Ferreira da Silva r****a@i****u 308
Fabio Silva f****o@i****u 205
Bill Mullins b****s@l****m 147
George Papadimitriou g****p@i****u 80
zaiyan-alam 1****m 74
Rafael Ferreira da Silva r****a 38
Dariusz Krol d****l@a****l 24
Duncan Brown d****n@s****u 20
mukundmurrali9@gmail.com m****9@g****m 17
dcbriggs d****s@g****m 16
Rafael Ferreira da Silva s****a@r****u 16
Loïc Pottier l****r@i****u 15
zaiyan-alam m****m@i****u 12
Atul Kumar a****r@i****u 11
Zaiyan Alam a****m@u****u 11
Dariusz Krol d****k@i****u 7
Duncan Brown d****n@l****g 6
Duncan Macleod d****d@l****g 6
Karan Vahi v****i@b****u 6
Weiwei Chen w****n@i****u 6
and 20 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1,897
  • Total pull requests: 216
  • Average time to close issues: about 13 hours
  • Average time to close pull requests: 12 days
  • Total issue authors: 8
  • Total pull request authors: 12
  • Average comments per issue: 1.73
  • Average comments per pull request: 0.94
  • Merged pull requests: 81
  • Bot issues: 0
  • Bot pull requests: 12
Past Year
  • Issues: 1,897
  • Pull requests: 98
  • Average time to close issues: about 13 hours
  • Average time to close pull requests: about 11 hours
  • Issue authors: 8
  • Pull request authors: 1
  • Average comments per issue: 1.73
  • Average comments per pull request: 1.42
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mayani (1,873)
  • vahi (16)
  • rynge (3)
  • pablogoitia (1)
  • ryzogg (1)
  • spxiwh (1)
  • titodalcanton (1)
  • rstory (1)
Pull Request Authors
  • mayani (98)
  • zaiyan-alam (77)
  • dependabot[bot] (12)
  • duncan-brown (10)
  • duncanmmacleod (7)
  • edquist (3)
  • ehabsoa (2)
  • hariharan-devarajan (2)
  • spxiwh (2)
  • steffen-AEI (1)
  • ahnitz (1)
  • dkrol (1)
Top Labels
Issue Labels
sync-from-jira (1,588) major (1,371) fix-master (1,121) affects-master (1,092) Pegasus Planner (377) fix-5.0.0 (330) minor (155) fix-3.0 (140) Documentation (134) Monitord (134) fix-4.6.0 (123) fix-3.1 (122) affects-3.1 (117) affects-5.0.0 (113) wontfix (110) fix-4.5.0 (98) Workflow API Libraries (87) fix-4.0 (87) fix-4.9.0 (80) fix-4.7.0 (69) affects-4.0 (67) fix-4.8.0 (63) statistics visualization and debugging tools (62) Planner: Transfer Module (54) fix-5.1.0 (50) Pegasus Dashboard (49) fix-4.1 (48) Catalog: Replica Catalog (42) affects-3.0 (42) fix-4.3 (42)
Pull Request Labels
sync-from-jira (82) major (65) fix-master (61) affects-master (58) Pegasus Planner (18) fix-5.0.0 (15) dependencies (12) minor (12) fix-3.0 (12) fix-4.5.0 (9) Monitord (8) affects-3.0 (6) affects-5.0.0 (6) affects-3.1 (6) fix-4.1 (5) fix-3.1 (5) wontfix (4) fix-2.4.3 (4) fix-4.9.0 (4) fix-4.8.2 (4) fix-4.6.0 (4) affects-2.0 (4) fix-4.0 (3) critical (3) Pegasus Dashboard (3) Documentation (3) Workflow API Libraries (3) fix-4.6.2 (3) fix-4.7.0 (3) affects-4.8.1 (3)

Packages

  • Total packages: 11
  • Total downloads:
    • pypi 16,962 last-month
  • Total docker downloads: 403
  • Total dependent packages: 11
    (may contain duplicates)
  • Total dependent repositories: 37
    (may contain duplicates)
  • Total versions: 80
  • Total maintainers: 3
pypi.org: pegasus-wms.api

Pegasus Workflow Management System Python API

  • Versions: 11
  • Dependent Packages: 1
  • Dependent Repositories: 14
  • Downloads: 7,992 Last month
  • Docker Downloads: 122
Rankings
Downloads: 2.6%
Dependent packages count: 3.3%
Dependent repos count: 3.9%
Average: 4.1%
Forks count: 5.2%
Stargazers count: 5.6%
Maintainers (2)
Last synced: 7 months ago
pypi.org: pegasus-wms.common

Pegasus Workflow Management System Python Commons

  • Versions: 11
  • Dependent Packages: 0
  • Dependent Repositories: 6
  • Downloads: 6,885 Last month
  • Docker Downloads: 122
Rankings
Downloads: 2.6%
Forks count: 5.2%
Average: 5.4%
Stargazers count: 5.6%
Dependent repos count: 6.2%
Dependent packages count: 7.4%
Maintainers (2)
Last synced: 6 months ago
pypi.org: pegasus-wms

Pegasus Workflow Management System Python Codebase

  • Versions: 11
  • Dependent Packages: 1
  • Dependent Repositories: 11
  • Downloads: 1,843 Last month
  • Docker Downloads: 53
Rankings
Dependent packages count: 3.3%
Dependent repos count: 4.5%
Forks count: 5.2%
Stargazers count: 5.6%
Average: 9.0%
Downloads: 26.6%
Maintainers (3)
Last synced: 6 months ago
pypi.org: pegasus-wms.dax

Pegasus Workflow Management System Python API

  • Versions: 10
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 43 Last month
  • Docker Downloads: 53
Rankings
Forks count: 5.2%
Downloads: 5.5%
Stargazers count: 5.5%
Dependent packages count: 6.6%
Average: 10.7%
Dependent repos count: 30.6%
Maintainers (2)
Last synced: 6 months ago
pypi.org: pegasus-wms.worker

Pegasus Workflow Management System Worker Package Tools

  • Versions: 11
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 199 Last month
  • Docker Downloads: 53
Rankings
Forks count: 5.2%
Stargazers count: 5.6%
Dependent packages count: 7.4%
Dependent repos count: 11.9%
Average: 13.3%
Downloads: 36.5%
Maintainers (2)
Last synced: 6 months ago
conda-forge.org: python-pegasus-wms

This package contains the Python APIs for Pegasus WMS, including: - The DAX API (Versions 2 and 3) - The PDAX API (Version 2) - The monitoring API - The Stampede database API - The Pegasus statistics API - The Pegasus plots API - Misc. Pegasus utilities

  • Versions: 6
  • Dependent Packages: 2
  • Dependent Repositories: 4
Rankings
Dependent repos count: 16.2%
Dependent packages count: 19.6%
Average: 21.8%
Forks count: 22.2%
Stargazers count: 29.2%
Last synced: 6 months ago
conda-forge.org: pegasus-wms.common
  • Versions: 4
  • Dependent Packages: 2
  • Dependent Repositories: 0
Rankings
Dependent packages count: 19.5%
Forks count: 20.1%
Average: 25.1%
Stargazers count: 26.9%
Dependent repos count: 34.0%
Last synced: 6 months ago
conda-forge.org: pegasus-wms.api
  • Versions: 4
  • Dependent Packages: 2
  • Dependent Repositories: 0
Rankings
Dependent packages count: 19.5%
Forks count: 20.1%
Average: 25.1%
Stargazers count: 26.9%
Dependent repos count: 34.0%
Last synced: 6 months ago
conda-forge.org: pegasus-wms.dax

The `Pegasus.DAX3` API has been deprecated and will be removed in v5.1.0. Please use the new API released in v5.0.0.

  • Versions: 4
  • Dependent Packages: 1
  • Dependent Repositories: 0
Rankings
Forks count: 20.1%
Stargazers count: 26.9%
Average: 27.5%
Dependent packages count: 28.8%
Dependent repos count: 34.0%
Last synced: 6 months ago
conda-forge.org: pegasus-wms

Pegasus Workflow Management System Python API ============================================= This package contains the Python APIs for Pegasus WMS, including: 1. The DAX API (Versions 2 and 3) 2. The PDAX API (Version 2) 3. The monitoring API 4. The Stampede database API 5. The Pegasus statistics API 6. The Pegasus plots API 7. Misc. Pegasus utilities 8. The pegasus service, including the ensemble manager and dashboard

  • Versions: 4
  • Dependent Packages: 1
  • Dependent Repositories: 0
Rankings
Forks count: 20.1%
Stargazers count: 26.9%
Average: 27.5%
Dependent packages count: 28.8%
Dependent repos count: 34.0%
Last synced: 6 months ago
conda-forge.org: pegasus-wms.worker
  • Versions: 4
  • Dependent Packages: 1
  • Dependent Repositories: 0
Rankings
Forks count: 20.1%
Stargazers count: 26.9%
Average: 27.5%
Dependent packages count: 28.8%
Dependent repos count: 34.0%
Last synced: 6 months ago

Dependencies

release-tools/jars/pom.xml maven
  • com.google.googlejavaformat:google-java-format 1.7 test
  • org.junit.jupiter:junit-jupiter-engine 5.6.0 test
  • org.junit.vintage:junit-vintage-engine 5.6.0 test
  • org.mockito:mockito-core 3.2.4 test
packages/pegasus-api/pyproject.toml pypi
packages/pegasus-api/setup.py pypi
  • pegasus-wms.common *
packages/pegasus-common/pyproject.toml pypi
packages/pegasus-common/setup.py pypi
  • PyYAML >5.3
  • dataclasses *
packages/pegasus-python/pyproject.toml pypi
packages/pegasus-python/setup.py pypi
  • DAX *
  • Flask >1.1,<2.3
  • Flask-Caching >1.8
  • GitPython >1.0
  • PyYAML >5.3
  • Utils *
  • dataclasses *
  • pamela >=1.0,<1.1.0
  • pegasus-init *
  • pegasus-wms.api *
  • pegasus-wms.common *
  • pegasus-wms.worker *
  • pika >=1.1.0
  • requests >2.23
  • sqlalchemy >1.3,<1.4
packages/pegasus-worker/pyproject.toml pypi
packages/pegasus-worker/setup.py pypi
  • boto3 >1.12
  • globus-sdk >=3.5.0
  • six >=1.9.0
test/core/046-aws-batch-black/Dockerfile docker
  • amazonlinux latest build
tutorial/docker/Dockerfile docker
  • rockylinux 8 build
src/requirements.txt pypi
  • Flask ==2.0.3
  • Flask-Caching ==1.10.1
  • Jinja2 ==3.0.3
  • MarkupSafe ==2.0.1
  • PyJWT ==2.8.0
  • SQLAlchemy ==1.3.24
  • Werkzeug ==2.0.3
  • boto3 ==1.23.10
  • botocore ==1.26.10
  • certifi ==2024.7.4
  • charset-normalizer ==2.0.12
  • charset-normalizer ==3.3.2
  • click ==8.0.4
  • dataclasses ==0.8
  • globus-sdk ==3.41.0
  • idna ==3.7
  • importlib-metadata ==4.8.3
  • importlib-resources ==5.12.0
  • itsdangerous ==2.0.1
  • jmespath ==0.10.0
  • pamela ==1.0.0
  • pika ==1.3.1
  • pycparser ==2.21
  • python-dateutil ==2.9.0.post0
  • requests ==2.31.0
  • requests ==2.27.1
  • s3transfer ==0.5.2
  • six ==1.16.0
  • smmap ==5.0.0
  • typing_extensions ==4.1.1
  • typing_extensions ==4.7.1
  • urllib3 ==1.26.19
  • urllib3 ==2.0.7
  • zipp ==3.6.0
  • zipp ==3.15.0