https://github.com/converged-computing/state-machine-operator

State machine workflow orchestration for Kubernetes (under development)

https://github.com/converged-computing/state-machine-operator

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.1%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

State machine workflow orchestration for Kubernetes (under development)

Basic Info
  • Host: GitHub
  • Owner: converged-computing
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 581 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 2
  • Releases: 1
Created over 1 year ago · Last pushed 11 months ago
Metadata Files
Readme License

README.md

state-machine-operator

State machines in Kubernetes (and coming soon, Flux)! 🐦‍🔥

img/state-machine-operator.png

The Kubernetes operator provided here works with the python library of the same name, which can be used on bare metal (without the operator) to orchestrate jobs in Flux. Both are state machines and event driven.

Flux Usage

To use alongside Flux, the easiest thing to do is work in the .devcontainer.

1. Start Flux Instance

If you aren't running in a system instance:

bash flux start --test-size=4

2. Install Development Library

bash sudo pip install -e ./python

3. Run a state machine workflow

We can use the state-machine-manager executable directly to run a state machine. Note that in the Kubernetes operator, this is deployed as a Deployment, and a tracker created therein that knows how to create Kubernetes jobs.

bash state-machine-manager start ./examples/local/state-machine-workflow.yaml --config-dir=./examples/local --scheduler flux --filesystem

Kubernetes Usage

Prerequisites

These are for the Kubernetes deployment

  • go version v1.22.0+
  • docker version 17.03+.
  • kubectl version v1.11.3+.
  • Access to a Kubernetes v1.11.3+ cluster.

1. Create Cluster

You can create a cluster locally (if your computer is chonky and can handle it) or use AWS. Here is locally:

bash kind create cluster --config ./examples/kind-config.yaml

2. State Machine Workflows

We provide two examples - one using the operator, and one manual for those that want to create the various objects and understand how the state machine operator (and corresponding Python library) work. For the manual examples, see the readme in examples. We will continue here with the operator example.

3. Install the Operator

The operator is built via its manifest in dist. For development:

```bash

Install and load into general cluster

make test-deploy-recreate

Install and load into kind

make test-deploy-kind ```

For non-development:

bash kubectl apply -f examples/dist/state-machine-operator.yaml

And apply the CRD to create the state machine. For interactive work, remember to set spec->workflow->interactive (or the same for any job under jobs) to true.

bash kubectl apply -f examples/state-machine.yaml

For the Mummi example (all code is private) see examples/mummi.

Job Variables

For each job script section, the following environment variables are provided for your application:

  • jobid: the job identifer, which defaults to job_ and can be set under the state machine workflow->prefix.
  • outpath: defaults to /tmp/out and is where your working directory will be, and where output is expected to be written.
  • registry: the registry where your artifact will be pushed
    • pull_tag: the pull tag to use (if the workflow is pulling)
    • push_tag: the push tag to use (if the workflow is pushing)
  • properties:
    • node-selector: key value pair to be added as node selectors for the job (Kubernetes only). E.g., node.kubernetes.io/instance-type: c7a.4xlarge

Take a look at the simple example examples/state-machine.yaml to see how push/pull is defined between steps. Given that these are found (with a tag) your artifact will be named <registry>:<jobid>:<tag> to be moved between steps.

Design

These are some design decisions I've made (of course open to discussion):

Initial Design

  • The workflow model is a state machine - state is derived from Kubernetes, always
  • The state machine manager manages units of job sequences (each a state machine) and each state machine orchestrates the logic of the jobs within it.
  • No application code (the jobs) is tangled with the state machine or manager
  • We assume jobs don't need to be paused / resumed / reclaimed like on HPC
  • Jobs are modular units with a config known how to be parsed by the manager, and the rest is provided to them.

TODO

  • Make sure there are labels for each of kubernetes and flux to distuish jobs in the workflow vs. not.
  • We likely want to test with a real registry OR allow a volume bind (existing data) to the registry.
    • Otherwise, artifacts deleted on cleanup. We could also have an option that allows keeping the ephemeral registry.

Questions

  • Under what conditions do we cancel / cleanup jobs?
  • I haven't tested a failure yet (or need to cleanup / delete)
  • We might want to do other cleanup (e.g., config maps)

License

HPCIC DevTools is distributed under the terms of the MIT license. All new contributions must be made under this license.

See LICENSE, COPYRIGHT, and NOTICE for details.

SPDX-License-Identifier: (MIT)

LLNL-CODE- 842614

Owner

  • Name: Converged Computing
  • Login: converged-computing
  • Kind: organization

The best of cloud and high performance computing: technology and community combined.

GitHub Events

Total
  • Create event: 29
  • Release event: 1
  • Issues event: 1
  • Watch event: 1
  • Delete event: 13
  • Member event: 1
  • Issue comment event: 2
  • Push event: 87
  • Pull request event: 50
Last Year
  • Create event: 29
  • Release event: 1
  • Issues event: 1
  • Watch event: 1
  • Delete event: 13
  • Member event: 1
  • Issue comment event: 2
  • Push event: 87
  • Pull request event: 50

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 27
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 day
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 20
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 27
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 day
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 20
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • vsoch (27)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 10 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
pypi.org: state-machine-operator

State Machine orchestrator intended for Kubernetes

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 10 Last month
Rankings
Dependent packages count: 9.7%
Average: 32.2%
Dependent repos count: 54.7%
Maintainers (1)
Last synced: 10 months ago

Dependencies

docker/manager/Dockerfile docker
  • rockylinux 9 build
python/pyproject.toml pypi
python/requirements.txt pypi
  • jsonschema *
  • pika *
  • pyyaml *
python/setup.py pypi
  • Jinja2 *
  • jsonschema *
  • kubernetes *
  • python-statemachine *
  • pyyaml *