dev-platform-dask-gateway

Bootstraps a dask gateway deployment on minikube using skaffold

https://github.com/fabricebrito/dev-platform-dask-gateway

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.6%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Bootstraps a dask gateway deployment on minikube using skaffold

Basic Info
  • Host: GitHub
  • Owner: fabricebrito
  • Language: Python
  • Default Branch: main
  • Size: 94.7 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Codemeta

README.md

Dask Gateway

Bootstraps a dask gateway deployment on minikube using skaffold.

The Dask worker pods run the image that is defined by the Dockerfile.worker file.

The client pod runs the image that is defined by the Dockerfile.client file.

In Dask, the client, the scheduler and the workers must run the same Python libraries, so:

  • the local development environment is created using the same dependencies defined in requirements.txt with pip install -r requirements.txt
  • the client pod that can be used via shell in kubernetes, also uses pip install -r requirements.txt, a line in the Dockerfile.client file
  • the workers also use pip install -r requirements.txt, a line in the Dockerfile.worker file

An update in requirements.txt and/or in the Dockerfile.* triggers an update of the Dask Gateway deployment.

Requirements

Setup

Start your minikube cluster:

minikube start

Install Dask Gateway development platform with:

skaffold dev This builds and pushes two container images to minikube node image cache:

  • the worker image worker built with the docker file Dockerfile.worker
  • the client image daskclient built with the docker file Dockerfile.client

And installs two helm releases:

  • dask-gateway using the chart https://helm.dask.org/dask-gateway-2024.1.0.tgz. The values set the worker image built with the docker file Dockerfile.worker. The Dask Gateway configuration is extended to allow clients to set image, worker_cores, worker_cores_limit and worker_memory (see the file dask-gateway/values.yalm)
  • dask-session a local chart creating a deployment with a pod running the image built with the docker file Dockerfile.client

Wait for the deployment to stabilize, the logs will show the tags of the built images, e.g.:

Tags used in deployment: - worker -> worker:3853d0ad064e3f6b76696a81c99148113b44ac297759843d4c302017d4abaf45 - daskclient -> daskclient:fc88475cf797d64f0069d3bb4119a86d092b8d9761c77dfa882aa845f2e53be5

``` Checking cache... - worker: Found. Tagging - daskclient: Found. Tagging Tags used in deployment: - worker -> worker:3853d0ad064e3f6b76696a81c99148113b44ac297759843d4c302017d4abaf45 - daskclient -> daskclient:fc88475cf797d64f0069d3bb4119a86d092b8d9761c77dfa882aa845f2e53be5 Starting deploy... Helm release dask-gateway not installed. Installing... NAME: dask-gateway LAST DEPLOYED: Fri Jul 5 16:43:16 2024 NAMESPACE: dask-gateway STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: You've installed Dask-Gateway version 2024.1.0, from chart version 2024.1.0!

Your release is named "dask-gateway" and installed into the namespace "dask-gateway".

You can find the public address(es) at:

$ kubectl --namespace=dask-gateway get service traefik-dask-gateway Helm release dask-session not installed. Installing... NAME: dask-session LAST DEPLOYED: Fri Jul 5 16:43:17 2024 NAMESPACE: dask-gateway STATUS: deployed REVISION: 1 TEST SUITE: None WARN[0012] image [worker:3853d0ad064e3f6b76696a81c99148113b44ac297759843d4c302017d4abaf45] is not used. subtask=-1 task=DevLoop WARN[0012] See helm documentation on how to replace image names with their actual tags: https://skaffold.dev/docs/pipeline-stages/deployers/helm/#image-configuration subtask=-1 task=DevLoop Waiting for deployments to stabilize... - dask-gateway:deployment/controller-dask-gateway is ready. [3/4 deployment(s) still pending] - dask-gateway:deployment/dask-session is ready. [2/4 deployment(s) still pending] - dask-gateway:deployment/api-dask-gateway: waiting for rollout to finish: 0 of 1 updated replicas are available... - dask-gateway:deployment/traefik-dask-gateway: waiting for rollout to finish: 0 of 1 updated replicas are available... - dask-gateway:deployment/traefik-dask-gateway is ready. [1/4 deployment(s) still pending] - dask-gateway:deployment/api-dask-gateway is ready. Deployments stabilized in 10.081 seconds Starting post-deploy hooks... Deployment replicas: 1 Deployment with label app.kubernetes.io/name=dask-gateway is running Completed post-deploy hooks Port forwarding service/traefik-dask-gateway in namespace dask-gateway, remote port 80 -> http://127.0.0.1:8001 Listing files to watch... - worker - daskclient Press Ctrl+C to exit Watching for changes... ```

Open the browser on https://127.0.0.1:8001, this will print 404: Not Found. This is ok, it is the Dask Gateway port forward that you can use from your local development environment.

Getting started

Local client running on your machine

Create a Python environment with:

python3 -m venv env_test_dask source env_test_dask/bin/activate pip install -r requirements.txt

Use the Python code below to get started:

``` from time import sleep from dask_gateway import Gateway

gateway = Gateway("http://localhost:8001")

from daskgateway import GatewayCluster cluster = gateway.newcluster()

print("Scaling cluster to 4 workers") cluster.scale(4) client = cluster.get_client()

print(f"Cluster dashboard: {cluster.dashboard_link}")

sleep(60)

cluster.shutdown() ```

Access the dask client pod

Open a shell on the dask-session deployment pod. There are two environment variables set:

  • DASK_GATEWAY_URL: the Dask Gateway endpoint
  • DASK_WORKER_IMAGE: the container image for the dask scheduler and workers

Dask Cluster configuration

This section is informative.

Set the worker container image at runtime

If the dask-gateway Helm chart values includes:

yaml gateway: extraConfig: dask_gateway_config.py: | c = get_config() from dask_gateway_server.options import Options, String c.Backend.cluster_options = Options( String("image", default="daskgateway/dask-worker:latest", label="Worker Image") )

Then the Python code may define the Dask scheduler and workers' image:

```python from dask_gateway import Gateway from time import sleep

Connect to the Dask Gateway

gateway = Gateway("http://localhost:8001")

Define cluster options with a custom worker container image

clusteroptions = gateway.clusteroptions()

print(clusteroptions) clusteroptions['image'] = 'docker.io/library/worker:5ee153c-dirty'

Create a new cluster with the specified options

cluster = gateway.newcluster(clusteroptions)

Scale the cluster as needed

cluster.scale(5)

Use the cluster

from dask.distributed import Client client = Client(cluster)

sleep(30)

cluster.shutdown() ```

Set the worker cores and memory

If the Helm chart values includes:

yaml gateway: extraConfig: dask_gateway_config.py: | c = get_config() from dask_gateway_server.options import Options, String, Integer c.Backend.cluster_options = Options( Integer("worker_cores_limit", default=1, label="Worker Cores Limit"), Integer("worker_cores", default=1, label="Worker Cores"), String("worker_memory", default="1 G", label="Worker Memory"), )

Then you can use:

```python from dask_gateway import Gateway from time import sleep

Connect to the Dask Gateway

gateway = Gateway("http://localhost:8001")

Define cluster options

clusteroptions = gateway.clusteroptions()

clusteroptions['workercores'] = 1 clusteroptions['workercoreslimit'] = 2 clusteroptions['worker_memory'] = "2 G"

Create a new cluster with the specified options

cluster = gateway.newcluster(clusteroptions)

Scale the cluster as needed

cluster.scale(5)

Use the cluster

from dask.distributed import Client client = Client(cluster)

sleep(30) client.close()

cluster.shutdown() ```

The dask-gateway helm chart values defines:

python c = get_config() from dask_gateway_server.options import Options, String, Integer, Float c.Backend.cluster_options = Options( Float("worker_cores_limit", default=1, label="Worker Cores Limit"), Float("worker_cores", default=1, label="Worker Cores"), String("worker_memory", default="1 G", label="Worker Memory"), String("image", default="daskgateway/dask-worker:latest", label="Worker Image") )

so worker_cores_limit, worker_cores, worker_memory and image can be defined.

Owner

  • Name: Fabrice Brito
  • Login: fabricebrito
  • Kind: user
  • Location: Rome, Italy
  • Company: Terradue

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "version": "1.0.0"
}

GitHub Events

Total
  • Member event: 1
  • Push event: 38
  • Pull request review event: 1
  • Pull request event: 2
  • Fork event: 1
Last Year
  • Member event: 1
  • Push event: 38
  • Pull request review event: 1
  • Pull request event: 2
  • Fork event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 0
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: about 1 hour
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: about 1 hour
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • fabricebrito (2)
Pull Request Authors
  • mr-c (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

example/k8s/bai/Dockerfile docker
  • docker.io/python 3.10-slim-bullseye build
example/option-b/bai/Dockerfile docker
  • docker.io/python 3.10-slim-bullseye build
example/k8s/bai/requirements.txt pypi
  • bokeh *
  • dask ==2024.1.0
  • dask-gateway ==2024.1.0
  • distributed ==2024.1.0
  • loguru *
  • numpy ==1.26.3
  • pystac *
  • rioxarray *
  • stackstac *
example/option-b/bai/requirements.txt pypi
  • bokeh *
  • dask ==2024.1.0
  • dask-gateway ==2024.1.0
  • distributed ==2024.1.0
  • loguru *
  • numpy ==1.26.3
  • pystac *
  • rioxarray *
  • stackstac *
requirements.txt pypi
  • bokeh *
  • dask ==2024.1.0
  • dask-gateway ==2024.1.0
  • distributed ==2024.1.0
  • loguru *
  • numpy ==1.26.3
  • pystac *
  • rioxarray *