carbon-aware-nextflow-on-k8s
Run carbon-aware Nextflow pipelines on Kubernetes using real-time ElectricityMap data to reduce emissions from compute-heavy tasks.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 10 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.9%) to scientific vocabulary
Repository
Run carbon-aware Nextflow pipelines on Kubernetes using real-time ElectricityMap data to reduce emissions from compute-heavy tasks.
Basic Info
- Host: GitHub
- Owner: YagmurKati
- License: mit
- Language: Shell
- Default Branch: main
- Size: 24.4 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Carbon-Aware Nextflow Pipeline on Kubernetes
This project runs a carbon-aware Nextflow pipeline on any Kubernetes cluster using real-time carbon intensity data from ElectricityMap. The pipeline delays compute-heavy tasks until carbon emissions are low — making your workflows more sustainable.
What’s Inside?
- Live carbon-intensity gating – the
highenergy_std_taskautomatically delays execution when CO₂ > 250 g/kWh (configurable) and retries every hour for up to 24 h. Once it starts, it runs to completion. - Works on any Kubernetes cluster – nothing vendor-specific, you only need:
kubectlaccess- A ReadWriteMany (RWX) storage class (e.g. NFS, CephFS)
- Outbound internet to query the ElectricityMap API
- Self-contained bootstrap scripts – spin up a PVC, an “uploader” pod, and the Nextflow job with one command.
- Clean teardown –
reset.shremoves every Job/Pod and wipes the shared volume.
📑 Table of Contents
- Prerequisites
- Quick-Start (any cluster)
- Optional: FONDA Cluster Setup (HU Berlin)
- Carbon-Aware Logic
- Debugging & Monitoring
- Repository Layout
- Output Artifacts
- Clean-Up
- Customising the Workflow
- Why Carbon-Aware Scheduling on Kubernetes?
- References
✅ Prerequisites
| Requirement | Notes |
|-------------|-------|
| Kubernetes cluster | Must expose an RWX StorageClass |
| CLI tools | kubectl, envsubst, bash |
| ElectricityMap API token | Get one free at https://electricitymap.org/map |
| Nextflow | bash <(wget -qO- https://get.nextflow.io) or module load nextflow |
Insert your token into green_k8s_workflow.nf:
bash
export ELECTRICITYMAP_TOKEN="YOUR_API_TOKEN_HERE"
🚀 Quick-Start (any cluster)
```bash
1 Choose a namespace (create it if needed)
export NAMESPACE=my-namespace kubectl create namespace $NAMESPACE 2>/dev/null || true
2 Bootstrap shared storage + uploader pod
chmod +x bootstrap-nextflow-carbon.sh ./bootstrap-nextflow-carbon.sh
3 Wait for the uploader to be ready
kubectl wait pod/pvc-uploader -n $NAMESPACE --for=condition=Ready
4 Upload workflow files into the PVC
envsubst < nextflow.config > processed.config kubectl cp greenk8sworkflow.nf $NAMESPACE/pvc-uploader:/workspace/ kubectl cp processed.config $NAMESPACE/pvc-uploader:/workspace/nextflow.config kubectl cp nextflow-job.yaml $NAMESPACE/pvc-uploader:/workspace/
5 Launch the carbon-aware Nextflow job
export UNIQUE_ID=$(date +%s) envsubst < nextflow-job.yaml | kubectl apply -n $NAMESPACE -f -
6 Follow progress
kubectl get jobs -n $NAMESPACE kubectl get pods -n $NAMESPACE kubectl logs -f job/nextflow-run-$UNIQUE_ID -n $NAMESPACE ```
Optional: FONDA Cluster Setup (HU Berlin)
```bash
VPN & kubectl context
sudo wg-quick up $(pwd)/wg0.conf # connects to FONDA VPN
export KUBECONFIG=config.yml # points kubectl at the cluster
bash
kubectl get nodes -L usedby # confirm access
```
Then continue with the Quick-Start section above.
Carbon-Aware Logic (inside greenk8sworkflow.nf)
```bash
Query live CO₂ intensity (default zone = DE)
carbon=$(curl -s "https://api.electricitymap.org/v3/carbon-intensity/latest?zone=${ZONE:-DE}" \ -H "auth-token: $ELECTRICITYMAP_TOKEN" | jq -r '.carbonIntensity')
Retry while the grid is “too dirty”
retries=0
while [ "$carbon" -gt "${THRESHOLD:=250}" ] && [ "$retries" -lt 24 ]; do
echo "⚠️ $carbon gCO₂/kWh > $THRESHOLD — waiting 1 h (attempt $((retries+1))/24)"
sleep 3600
carbon=$(curl -s "https://api.electricitymap.org/v3/carbon-intensity/latest?zone=${ZONE:-DE}" \
-H "auth-token: $ELECTRICITYMAP_TOKEN" | jq -r '.carbonIntensity')
retries=$((retries+1))
done
``
| 🔧 **Behavior** | 💡 **Default** | 🛠️ **How to Change** |
|------------------------|--------------------|--------------------------------------|
| Carbon threshold |250 gCO₂/kWh| SetTHRESHOLDenvironment variable |
| Max wait time |24 h (24 × 1 h)| Edit thewhileloop condition |
| Grid zone |DE| Set theZONE` environment variable |
- ✅ Only the
highenergy_std_taskis carbon-aware —standard_taskandlongrun_taskstart immediately. - 🔁 You can copy and paste the logic into any other process block to make it carbon-aware.
🔍 Debugging & Monitoring
A single run creates five pods:
| Pod prefix / name | Purpose |
|------------------------------------|---------------------------------------------------------------------|
| nextflow-run-<uid> | Orchestrates the whole workflow |
| nf-… (×3) | One pod per pipeline process (standard_task, longrun_task, highenergy_std_task) |
| pvc-uploader | Keeps the shared PVC mounted for uploads/downloads |
```bash
List all workflow-related pods
kubectl get pods -n $NAMESPACE
Sample output (fresh run)
NAME READY STATUS AGE
nextflow-run-1748339780-9rccl 1/1 Running 32s
nf-0744354e0171577a3a1e1fba2da530a3-4673b 1/1 Running 17s
nf-5e082c2999d7e9efe4d8086f6f8d887b-bd005 1/1 Running 17s
nf-a9c9becd4f599b5bc086b9179294c7f0-23796 1/1 Running 17s
pvc-uploader 1/1 Running 7m
After the workflow finishes, only two pods remain:
bash
NAME READY STATUS AGE
nextflow-run-1748339780-9rccl 0/1 Completed 5m
pvc-uploader 1/1 Running 12m
Focus on the carbon-aware task
bash
Show only process pods
kubectl get pods -n $NAMESPACE | grep nf-
Inspect the carbon-aware pod
kubectl exec -it
📁 Repository Layout
bash
.
├── bootstrap-nextflow-carbon.sh # Deploy PVC + uploader
├── reset.sh # Clean everything
├── green_k8s_workflow.nf # Nextflow DSL2 pipeline
├── nextflow.config # K8s executor config + reports
├── nextflow-job.yaml # Template Job for Nextflow run
├── nextflow-pvc.yaml # RWX PVC definition
├── pvc-uploader.yaml # Sleep-infinity pod for file uploads
├── fetch-results.sh # Convenience downloader
├── .gitignore # Ignore work/ etc.
└── README.md
📤 Output Artifacts
After a successful run the shared PVC (/workspace) contains:
bash
/workspace/
├── report.html # Execution summary
├── timeline.html # Gantt chart of processes
├── trace.txt # CSV-style trace
└── work/ # Intermediate directories
Download them:
```bash
chmod +x fetch-results.sh && ./fetch-results.sh
or manually:
kubectl cp $NAMESPACE/pvc-uploader:/workspace/report.html ./report.html kubectl cp $NAMESPACE/pvc-uploader:/workspace/timeline.html ./timeline.html kubectl cp $NAMESPACE/pvc-uploader:/workspace/trace.txt ./trace.txt ```
🧼 Clean-Up
```bash chmod +x reset.sh && ./reset.sh
Deletes: all Jobs/Pods, uploader pod, ConfigMaps, and PVC content
```
Customising the Workflow
- Make every task carbon-aware: replicate the retry/sleep logic from
highenergy_std_taskin other process definitions. - Adjust simulated runtimes: in
green_k8s_workflow.nfchange sleep 10, sleep 3600, etc. to match real job durations. - Tune resource requests: edit labels and nextflow.config to schedule GPU, high-RAM, or node-affinity workloads.
- Happy (low-carbon) computing! 🌍
🌍 Why Carbon-Aware Scheduling on Kubernetes?
Electricity’s carbon intensity can swing 5- to 10-fold within a day as fossil plants ramp up and renewables ebb. By letting Kubernetes decide when to launch batch jobs—rather than running them immediately—we can ride those clean-power waves and cut emissions dramatically.
| Key takeaway | Supporting evidence | |--------------|---------------------| | Real-time CO₂ signals are accurate and actionable. ElectricityMap provides 5-minute carbon intensity data already used for operational decisions. | Tranberg et al. [9], Gorka et al. [8] | | Delaying non-urgent jobs typically cuts emissions by 30–60 %. | Piontek et al. [1]; Beena et al. [3] | | Carbon checks can be built into workflows, not just schedulers. | West et al. [5], Lechowicz et al. [6], James & Schien [2] |
🔧 What This Repo Adds
- Code-integrated carbon gating –
green_k8s_workflow.nfdelayshighenergy_std_taskuntil live CO₂ ≤ THRESHOLD, then lets the task run to completion without further checks. - Cluster-agnostic design – no custom controllers, CRDs, or admission webhooks; plain Jobs + PVCs work anywhere.
- Minimal footprint – a 4-line
curl→jq→sleeploop; no extra containers or binaries.
🧩 This approach introduces workflow-level adaptation, not merely a scheduling adjustment — giving users full control over which tasks defer and under what conditions.
📚 References
Carbon-aware scheduling & orchestration
[1] T. Piontek et al. “Carbon Emission-Aware Job Scheduling for Kubernetes Deployments.” J. Supercomput., 2024. https://doi.org/10.1007/s11227-023-05506-7
[2] A. James and D. Schien. “A Low-Carbon Kubernetes Scheduler.” ICT4S, 2019. https://ceur-ws.org/Vol-2382/ICT4S2019_paper_28.pdf
[3] B. M. Beena et al. “A Green Cloud-Based Framework for Energy-Efficient Task Scheduling Using Carbon-Intensity Data …” IEEE Access, 13, 2025. https://doi.org/10.1109/ACCESS.2025.3562882
[4] W. A. Hanafy et al. “Going Green for Less Green: Optimizing the Cost of Reducing Cloud Carbon Emissions.” ASPLOS ’24, 2024. https://doi.org/10.1145/3620666.3651374
Workflow-level & scientific computing
[5] K. West et al. “Exploring the Potential of Carbon-Aware Execution for Scientific Workflows.” arXiv:2503.13705, 2025. https://arxiv.org/abs/2503.13705
[6] A. Lechowicz et al. “Carbon- and Precedence-Aware Scheduling for Data-Processing Clusters.” arXiv:2502.09717, 2025. https://arxiv.org/abs/2502.09717
HPC & data-center decarbonization surveys
[7] C. A. Silva et al. “A Review on the Decarbonization of High-Performance Computing Centers.” Renew. Sustain. Energy Rev. 189 (2024): 114019. https://doi.org/10.1016/j.rser.2023.114019
Carbon-intensity data & accounting methods
[8] J. Gorka et al. “ElectricityEmissions.jl: A Framework for the Comparison of Carbon-Intensity Signals.” arXiv:2411.06560, 2024. https://arxiv.org/abs/2411.06560
[9] B. Tranberg et al. “Real-Time Carbon Accounting Method for the European Electricity Markets.” Energy Strategy Rev. 26 (2019): 100367. https://doi.org/10.1016/j.esr.2019.100367
Owner
- Login: YagmurKati
- Kind: user
- Repositories: 1
- Profile: https://github.com/YagmurKati
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this carbon-aware Nextflow-Kubernetes workflow in your work, please cite it as below."
title: "Carbon-Aware Nextflow Pipeline on Kubernetes"
authors:
- family-names: Kati
given-names: Yagmur
affiliation: Humboldt University of Berlin
date-released: 2025-05-28
version: 1.0.0
license: MIT
repository-code: https://github.com/YagmurKati/carbon-aware-nextflow-on-k8s
keywords:
- nextflow
- kubernetes
- carbon-aware
- sustainable-computing
- workflow
- electricitymap
GitHub Events
Total
- Push event: 1
- Create event: 2
Last Year
- Push event: 1
- Create event: 2