https://github.com/larribas/dagger

Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).

https://github.com/larribas/dagger

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (18.6%) to scientific vocabulary

Keywords

argo-workflows data-engineering data-pipelines data-science distributed-systems pipelines-as-code workflows
Last synced: 5 months ago · JSON representation

Repository

Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).

Basic Info
  • Host: GitHub
  • Owner: larribas
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 9.97 MB
Statistics
  • Stars: 17
  • Watchers: 2
  • Forks: 7
  • Open Issues: 0
  • Releases: 8
Topics
argo-workflows data-engineering data-pipelines data-science distributed-systems pipelines-as-code workflows
Created almost 5 years ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License Roadmap

README.md

Dagger

Define sophisticated data pipelines and run them on different distributed systems (such as Argo Workflows).

Python Versions Supported Latest PyPI version Test Coverage (Codecov) Continuous Integration


Features

  • Define tasks and DAGs, and compose them together seamlessly.
  • Create dynamic for loops and map-reduce operations.
  • Run your DAGs locally or using a distributed workflow orchestrator (such as Argo Workflows).
  • Take advantage of advanced runtime features (e.g. Retry strategies, Kubernetes scheduling directives, etc.)
  • ... All with a simple Pythonic DSL that feels just like coding regular Python functions.

Other nice features of Dagger are: Zero dependencies, 100% test coverage, great documentation and plenty of examples to get you started.

Installation

Dagger is published to the Python Package Index (PyPI) under the name py-dagger. To install it, you can simply run:

pip install py-dagger

Looking for Tutorials and Examples?

Check our Documentation Portal!

Architecture Overview

Dagger is built around 3 components:

  • A set of core data structures that represent the intended behavior of a DAG.
  • A domain-specific language (DSL) that uses metaprogramming to capture how a DAG should behave, and represents it using the core data structures.
  • Multiple runtimes that inspect the core data structures to run the corresponding DAG, or prepare the DAG to run in a specific pipeline executor.

components

How to contribute

Do you have some feedback about the library? Have you implemented a Serializer or a Runtime that may be useful for the community? Do you think a tutorial or example could be improved?

Every contribution to Dagger is greatly appreciated.

Please read our Contribution Guidelines for more details.

Local development

We use Poetry to manage the dependencies of this library. In the codebase, you will find a Makefile with some useful commands to run and test your contributions. Namely:

  • make install - Install the project's dependencies
  • make test - Run tests and report test coverage. It will fail if coverage is too low.
  • make ci - Run all the quality checks we run for each commit/PR. This includes type hint checking, linting, formatting and documentation.
  • make build - Build the project.
  • make docker-build - Package the project in a Docker image
  • make docs-build - Build the documentation portal.
  • make docs-serve - Serve the documentation portal.
  • make k3d-set-up - Create a k3d cluster and image registry for the project.
  • make k3d-docker-push - Build and push the project's Docker image to the local k3d registry.
  • make k3d-install-argo - Install Argo on k3d, for local testing of Argo Workflows.
  • make k3d-tear-down - Destroy the k3d cluster and registry.

Owner

  • Name: Lorenzo Arribas
  • Login: larribas
  • Kind: user
  • Location: Barcelona
  • Company: @Glovo

Staff Software Engineer at Glovo. I specialize in distributed, event-driven architectures and Machine Learning operations.

GitHub Events

Total
  • Watch event: 3
  • Fork event: 1
Last Year
  • Watch event: 3
  • Fork event: 1

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 61
  • Total Committers: 3
  • Avg Commits per committer: 20.333
  • Development Distribution Score (DDS): 0.049
Top Committers
Name Email Commits
Lorenzo Arribas l****s@g****m 58
Pablo Barbero p****o@g****m 2
Razvan Tudorica r****a@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 10
  • Total pull requests: 49
  • Average time to close issues: 16 days
  • Average time to close pull requests: 16 days
  • Total issue authors: 2
  • Total pull request authors: 4
  • Average comments per issue: 0.3
  • Average comments per pull request: 0.98
  • Merged pull requests: 40
  • Bot issues: 0
  • Bot pull requests: 6
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • larribas (8)
  • pablobd (2)
Pull Request Authors
  • larribas (38)
  • dependabot[bot] (6)
  • pablobd (3)
  • raztud (2)
Top Labels
Issue Labels
bug (6) enhancement (4) stale (2)
Pull Request Labels
stale (8) dependencies (6) enhancement (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 60 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 8
  • Total maintainers: 1
pypi.org: py-dagger

Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).

  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 60 Last month
Rankings
Dependent packages count: 10.1%
Forks count: 14.2%
Stargazers count: 16.0%
Average: 20.5%
Dependent repos count: 21.6%
Downloads: 40.7%
Maintainers (1)
Last synced: 6 months ago

Dependencies

poetry.lock pypi
  • appdirs 1.4.4 develop
  • atomicwrites 1.4.0 develop
  • attrs 21.2.0 develop
  • beautifulsoup4 4.10.0 develop
  • black 20.8b1 develop
  • bracex 2.1.1 develop
  • click 8.0.1 develop
  • colorama 0.4.4 develop
  • coverage 5.5 develop
  • deepdiff 5.5.0 develop
  • flake8 3.9.2 develop
  • ghp-import 2.0.1 develop
  • html5lib 1.1 develop
  • importlib-metadata 4.8.1 develop
  • iniconfig 1.1.1 develop
  • isort 5.9.3 develop
  • jinja2 3.0.1 develop
  • lxml 4.6.3 develop
  • markdown 3.3.4 develop
  • markupsafe 2.0.1 develop
  • mccabe 0.6.1 develop
  • mergedeep 1.3.4 develop
  • mkapi 1.0.14 develop
  • mkdocs 1.2.2 develop
  • mkdocs-material 7.3.0 develop
  • mkdocs-material-extensions 1.0.3 develop
  • mypy 0.812 develop
  • mypy-extensions 0.4.3 develop
  • ordered-set 4.0.2 develop
  • packaging 21.0 develop
  • pathspec 0.9.0 develop
  • pluggy 1.0.0 develop
  • py 1.10.0 develop
  • pycodestyle 2.7.0 develop
  • pydocstyle 6.1.1 develop
  • pyflakes 2.3.1 develop
  • pygments 2.10.0 develop
  • pymdown-extensions 8.2 develop
  • pyparsing 2.4.7 develop
  • pyspelling 2.7.3 develop
  • pytest 6.2.5 develop
  • pytest-cov 2.12.1 develop
  • python-dateutil 2.8.2 develop
  • pyyaml 5.4.1 develop
  • pyyaml-env-tag 0.1 develop
  • regex 2021.8.28 develop
  • six 1.16.0 develop
  • snowballstemmer 2.1.0 develop
  • soupsieve 2.2.1 develop
  • toml 0.10.2 develop
  • typed-ast 1.4.3 develop
  • typing-extensions 3.10.0.2 develop
  • watchdog 2.1.5 develop
  • wcmatch 8.2 develop
  • webencodings 0.5.1 develop
  • zipp 3.5.0 develop
pyproject.toml pypi
  • PyYAML ^5.4.1 develop
  • black ^20.8b1 develop
  • deepdiff ^5.2.3 develop
  • flake8 ^3.9.2 develop
  • isort ^5.7.0 develop
  • mkapi ^1.0.14 develop
  • mkdocs ^1.2.2 develop
  • mkdocs-material ^7.2.6 develop
  • mypy ^0.812 develop
  • pydocstyle ^6.1.1 develop
  • pyspelling ^2.7.3 develop
  • pytest ^6.2 develop
  • pytest-cov ^2.12.0 develop
  • python >=3.8,<4.0
.github/workflows/continuous-integration.yaml actions
  • Gr1N/setup-poetry v7 composite
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • codecov/codecov-action v2 composite
  • github/codeql-action/analyze v1 composite
  • github/codeql-action/init v1 composite
.github/workflows/publish-gh-pages.yaml actions
  • Gr1N/setup-poetry v7 composite
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • peaceiris/actions-gh-pages v3 composite
.github/workflows/publish-library.yaml actions
  • Gr1N/setup-poetry v7 composite
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
Dockerfile docker
  • python 3.9.6-slim build