tuning-green-ai-pipelines

https://github.com/smart-dal/tuning-green-ai-pipelines

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: SMART-Dal
Language: Python
Default Branch: main
Size: 26.5 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Citation

♻️ Tu(r)ning AI Green

Quantifying how energy-efficient techniques applied at different stages of an AI pipeline interact, stack, and occasionally collide.

Project Goals
Supported Tasks & Models
Repository Layout
Running an Experiment
Energy & Carbon Instrumentation
Result Files

Project Goals

Large-scale AI systems burn substantial energy across five canonical stages:

Data preparation
Model Architecture design / selection
Training or fine-tuning
System-level deployment
Inference

This study builds multiple variants of the same end-to-end pipeline, each enabling energy-saving techniques in one or more of those stages. By comparing energy usage, carbon footprint, latency and task accuracy across variants, we ask:

Do savings add up linearly, cancel one another out, or compound super-linearly?
Does "optimise everywhere" always beat "optimise the bottleneck"?

Supported Tasks & Models

| Task | Dataset | Model(s) | Notes | |------|---------|----------|-------| | Vulnerability Detection | BigVul | ModernBERT | Binary vulnerability classification |

Each pipeline variant re-uses the same datasets & model checkpoints so that only the energy-efficiency knobs differ.

Repository Layout

green-pipeline-study/ ├── variants/ # One folder per experimental pipeline │ ├── v0_baseline/ # Baseline implementation │ ├── v1_gradient_checkpointing/ │ ├── v2_lora_peft/ # Parameter efficient fine-tuning │ ├── v3_quantization/ # Model quantization │ ├── v4_tokenizer/ # Tokenizer optimizations │ ├── v5_power_limit_100W/ # Power limiting │ ├── v6_optimizer/ # Optimizer configurations │ ├── v7_f16/ # FP16 precision │ ├── v8_sequence_length_trimming/ │ ├── v9_inference_engine/ │ ├── v10_dataloader_pin_memory/ │ ├── v11_torch_compile/ │ ├── v12_attention/ │ ├── v13_layer_pruning_4_top/ │ └── ... # Additional variants │ ├── common/ # Shared components │ ├── layer_drop.py # Layer pruning utilities │ └── generate_configs.py # Configuration generation │ ├── analysis_results/ # Analysis outputs ├── energy_modelling.py # Energy modeling utilities ├── analysis.py # Analysis scripts ├── requirements.txt # Project dependencies └── README.md # This file

Each variant in variants/ is self-contained with: * config.yaml – structured hyper-parameters * train_all_variants.sh – pipeline execution script

Running an Experiment

```bash

1. Set up environment

python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt

2. Run a pipeline variant

cd variants/v0_baseline # or any other variant python3 train.py ```

All energy, carbon and performance numbers are saved under analysis_results/.

Energy & Carbon Instrumentation

| Layer | Tool | What it Measures | |-------|------|------------------| | CPU/GPU/RAM system | CodeCarbon | Process-level energy + g CO₂/kWh |

Each stage starts an energy session; deltas are aggregated into the final analysis.

Result Files

A run produces:

<VARIANT>/results/*

You can also find the analyzed and combined result along with plots in analysis_results/.

jsonc { "variant": "v2_lora_peft", "energy_kwh": { "data": 0.14, "architecture": 0.00, "training": 1.92, "system": 0.08, "inference": 0.37, "total": 2.51 }, "co2_kg": 0.98, "accuracy": 0.842, "latency_ms": 23.5 } * analysis_results/aggregated_metrics.json – comprehensive analysis across all variants * Training progress logs under training_progress.log

Owner

Name: SMART-Dal
Login: SMART-Dal
Kind: organization

Repositories: 2
Profile: https://github.com/SMART-Dal

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "How to Tune Your AI Green?" 
version: "0.1.0"
url: "https://github.com/SMART-Dal/tuning-green-ai-pipelines"
date-released: 2025-06-12   
authors:
  - family-names: Rajput
    given-names: Saurabhsingh
    email: saurabh@dal.ca
    affiliation: "Dalhousie University, Halifax, Canada"
    orcid: "https://orcid.org/0000-0002-4630-2288"

  - family-names: Saad
    given-names: Mootez
    email: mootez@dal.ca
    affiliation: "Dalhousie University, Halifax, Canada"
    orcid: "https://orcid.org/0009-0008-8159-3632"

  - family-names: Sharma
    given-names: Tushar
    email: tushar@dal.ca
    affiliation: "Dalhousie University, Halifax, Canada"
    orcid: "https://orcid.org/0000-0002-0538-052X"

GitHub Events

Total

Release event: 1
Public event: 1
Push event: 2
Create event: 1

Last Year

Release event: 1
Public event: 1
Push event: 2
Create event: 1

Dependencies

requirements.txt pypi

accelerate >=0.20.0
bitsandbytes >=0.41.0
black >=22.0.0
codecarbon >=2.2.0
datasets >=2.12.0
flake8 >=4.0.0
gputil >=1.4.0
matplotlib >=3.5.0
numpy >=1.24.0
omegaconf >=2.3.0
pandas >=1.5.0
peft >=0.4.0
protobuf >=3.20.0
psutil >=5.9.0
pynvml >=11.5.0
pytest >=7.0.0
ruamel.yaml *
scikit-learn >=1.0.0
seaborn >=0.12.0
sentencepiece >=0.1.99
torch >=2.0.0
tqdm >=4.65.0
transformers >=4.30.0
vllm >=0.2.0
wandb >=0.15.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science