funsies
funsies: A minimalist, distributed and dynamic workflow engine - Published in JOSS (2021)
Science Score: 98.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in JOSS metadata -
○Academic publication links
-
✓Committers with academic emails
2 of 2 committers (100.0%) from academic institutions -
✓Institutional organization owner
Organization aspuru-guzik-group has institutional domain (aspuru.chem.harvard.edu) -
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Scientific Fields
Repository
funsies is a lightweight workflow engine 🔧
Basic Info
Statistics
- Stars: 41
- Watchers: 7
- Forks: 4
- Open Issues: 1
- Releases: 1
Topics
Metadata Files
README.md
funsies
is a python library and execution engine to build reproducible, fault-tolerant, distributed and composable computational workflows.
- 🐍 Workflows are specified in pure python.
- 🐦 Lightweight with few dependencies.
- 🚀 Easy to deploy to compute clusters and distributed systems.
- 🔧 Can be embedded in your own apps.
- 📏 First-class support for static analysis. Use mypy to check your workflows!
Workflows are encoded in a redis server and executed using the distributed job queue library RQ. A hash tree data structure enables automatic and transparent caching and incremental computing.
Source docs can be found here. Some example funsies scripts can be found in the recipes folder.
Installation
Using pip,
bash
pip install funsies
This will enable the funsies CLI tool as well as the funsies python
module. Python 3.7, 3.8 and 3.9 are supported. To run workflows, you'll need a
Redis server, version 4.x or higher. On Linux Redis can be installed using conda,
bash
conda install redis
pip,
bash
pip install redis-server
or your system package manager. On Mac OSX, Redis can be downloaded using Homebrew,
bash
brew install redis
(Windows is not supported by Redis, but a third-party package can be obtained from this repository. This has not been tested, however.)
Hello, funsies!
To run workflows, three components need to be connected:
- 📜 a python script describing the workflow
- 💻 a redis server that holds workflows and data
- 👷 worker processes that execute the workflow
funsies is distributed: all three components can be on different computers or
even be connected at different time. Redis is started using redis-server,
workers are started using funsies worker and the workflow is run using
python.
For running on a single machine, the start-funsies script takes care of starting the database and workers,
bash
start-funsies \
--no-pw \
--workers 2
Here is an example workflow script,
python
from funsies import Fun, reduce, shell
with Fun():
# you can run shell commands
cmd = shell('sleep 2; echo 👋 🪐')
# and python ones
python = reduce(sum, [3, 2])
# outputs are saved at hash addresses
print(f"my outputs are saved to {cmd.stdout.hash[:5]} and {python.hash[:5]}")
The workflow is just python, and is run using the python interpreter,
bash
$ python hello-world.py
my outputs are saved to 4138b and 80aa3
The Fun() context manager takes care of connecting to the database. The
script should execute immediately; no work is done just yet because workflows
are lazily executed.
To execute the workflow, we trigger using the hashes above using the CLI,
bash
$ funsies execute 4138b 80aa3
Once the workers are finished, results can be printed directly to stdout using their hashes,
bash
$ funsies cat 4138b
👋 🪐
$ funsies cat 80aa3
5
They can also be accessed from within python, from other steps in the workflows etc. Shutting down the database and workers can also be performed using the CLI,
bash
$ funsies shutdown --all
How does it work?
The design of funsies is inspired by git and ccache. All files and variable values are abstracted into a provenance-tracking DAG structure. Basically, "files" are identified entirely based on what operations lead to their creation. This (somewhat opinionated) design produces interesting properties that are not common in workflow engines:
Incremental computation
funsies automatically and transparently saves all input and output "files". This produces automatic and transparent checkpointing and incremental computing. Re-running the same funsies script, even on a different machine, will not perform any computations (beyond database lookups). Modifying the script and re-running it will only recompute changed results.
In contrast with e.g. Make, this is not based on modification date but directly on the data history, which is more robust to changes in the workflow.
Decentralized workflows
Workflows and their elements are not identified based on any global indexing scheme. This makes it possible to generate workflows fully dynamically from any connected computer node, to merge or compose DAGs from different databases and to dynamically re-parametrize them, etc.
No local file operations
All "files" are encoded in a redis instance or to a data directory, with no local filesystem management required. funsies workers can even operate without any permanent data storage, as is often the case in file-driven workflows using only a container's tmpfs.
Recovering from failures
Raised exceptions in python codes, worker failures, missing output files and
other error conditions are automatically caught by funsies workers, providing
fault tolerance to workflows. Errors are logged on stderr with full
traceback and can be recovered from the database.
Steps that depend on failed ones propagate those errors and their provenance. Errors can then be dealt with wherever it is most appropriate to do so using techniques from functional programming.
As an example, consider a workflow that first runs a CLI program simulate
that ought to produce a results.csv file, which is subsequently analyzed
using a python function analyze_data(),
```python import funsies as f
sim = f.shell("simulate data.inp", inp={"data.inp":"some input"}, out=["results.csv"]) final = f.reduce(analyze_data, sim.out["results.csv"]) ```
In a normal python program, analyze_data() would need to guard against the
possibility that results.csv is absent, or risk a fatal exception. In the
above funsies script, if results.csv is not produced, then it is replaced by
an instance of Error which tracks the failing step. The workflow engine
automatically shortcircuit the execution of analyze_data and insteads
forward the Error to final. In this way, the value of final provides
direct error tracing to the failed step. Furthermore, it means that
analyze_data does not need it's own error handling code if its output is
optional or if the error is better dealt with in a later step.
This error-handling approach is heavily influenced by the Result<T,E> type
from the Rust programming language.
Is it production-ready?
🧪 warning: funsies is research-grade code ! 🧪
At this time, the funsies API is fairly stable. However, users should know that database dumps are not yet fully forward- or backward-compatible, and breaking changes are likely to be introduced on new releases.
Related projects
funsies is intended as a lightweight alternative to industrial workflow engines, such as Apache Airflow or Luigi. We rely heavily on awesome python libraries: RQ library, loguru, Click and chevron. We are inspired by git, ccache, snakemake targets, rain and others. A comprehensive list of other worfklow engine can be found here.
License
funsies is provided under the MIT license.
Contributing
All contributions are welcome! Consult the CONTRIBUTING file for help. Please file issues for any bugs and documentation problems.
Owner
- Name: Aspuru-Guzik group repo
- Login: aspuru-guzik-group
- Kind: organization
- Website: http://aspuru.chem.harvard.edu/
- Repositories: 30
- Profile: https://github.com/aspuru-guzik-group
JOSS Publication
funsies: A minimalist, distributed and dynamic workflow engine
Authors
Department of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada
Department of Computer Science, University of Toronto, 40 St. George St, Toronto, Ontario M5S 2E4, Canada, Chemical Physics Theory Group, Department of Chemistry, University of Toronto, 80 St. George St, Toronto, Ontario M5S 3H6, Canada, Vector Institute for Artificial Intelligence, 661 University Ave Suite 710, Toronto, Ontario M5G 1M1, Canada, Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), 661 University Ave, Toronto, Ontario M5G, Canada
Tags
workflow redis decentralized computational chemistryGitHub Events
Total
- Fork event: 1
Last Year
- Fork event: 1
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Cyrille Lavigne | c****e@c****a | 389 |
| Daniel S. Katz | d****z@i****g | 3 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 4
- Total pull requests: 3
- Average time to close issues: 18 days
- Average time to close pull requests: 1 day
- Total issue authors: 2
- Total pull request authors: 2
- Average comments per issue: 1.75
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- gflofst (3)
- Leticia-maria (1)
Pull Request Authors
- danielskatz (2)
- clavigne (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 21 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 4
- Total maintainers: 1
pypi.org: funsies
Funsies is a library to build and execution engine for reproducible, composable and data-persistent computational workflows.
- Homepage: https://github.com/aspuru-guzik-group/funsies
- Documentation: https://funsies.readthedocs.io/
- License: MIT License
-
Latest release: 0.8.1
published over 4 years ago
Rankings
Maintainers (1)
Dependencies
- black * development
- flake8 * development
- flake8-annotations * development
- flake8-black * development
- flake8-bugbear * development
- flake8-docstrings * development
- flake8-import-order * development
- isort * development
- mypy * development
- mypy_extensions * development
- nox * development
- pytest * development
- pytest-cov * development
- cloudpickle *
- fakeredis *
- loguru *
- lupa *
- redis *
- redis-server *
- rq >=1.7
- chevron *
- cloudpickle *
- importlib-metadata *
- loguru *
- mypy_extensions *
- redis *
- rq >=1.7
- typing_extensions *