tuning-green-ai-pipelines
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: SMART-Dal
- Language: Python
- Default Branch: main
- Size: 26.5 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
♻️ Tu(r)ning AI Green
Quantifying how energy-efficient techniques applied at different stages of an AI pipeline interact, stack, and occasionally collide.
Table of Contents
- Project Goals
- Supported Tasks & Models
- Repository Layout
- Running an Experiment
- Energy & Carbon Instrumentation
- Result Files
Project Goals
Large-scale AI systems burn substantial energy across five canonical stages:
- Data preparation
- Model Architecture design / selection
- Training or fine-tuning
- System-level deployment
- Inference
This study builds multiple variants of the same end-to-end pipeline, each enabling energy-saving techniques in one or more of those stages. By comparing energy usage, carbon footprint, latency and task accuracy across variants, we ask:
Do savings add up linearly, cancel one another out, or compound super-linearly?
Does "optimise everywhere" always beat "optimise the bottleneck"?
Supported Tasks & Models
| Task | Dataset | Model(s) | Notes | |------|---------|----------|-------| | Vulnerability Detection | BigVul | ModernBERT | Binary vulnerability classification |
Each pipeline variant re-uses the same datasets & model checkpoints so that only the energy-efficiency knobs differ.
Repository Layout
green-pipeline-study/
├── variants/ # One folder per experimental pipeline
│ ├── v0_baseline/ # Baseline implementation
│ ├── v1_gradient_checkpointing/
│ ├── v2_lora_peft/ # Parameter efficient fine-tuning
│ ├── v3_quantization/ # Model quantization
│ ├── v4_tokenizer/ # Tokenizer optimizations
│ ├── v5_power_limit_100W/ # Power limiting
│ ├── v6_optimizer/ # Optimizer configurations
│ ├── v7_f16/ # FP16 precision
│ ├── v8_sequence_length_trimming/
│ ├── v9_inference_engine/
│ ├── v10_dataloader_pin_memory/
│ ├── v11_torch_compile/
│ ├── v12_attention/
│ ├── v13_layer_pruning_4_top/
│ └── ... # Additional variants
│
├── common/ # Shared components
│ ├── layer_drop.py # Layer pruning utilities
│ └── generate_configs.py # Configuration generation
│
├── analysis_results/ # Analysis outputs
├── energy_modelling.py # Energy modeling utilities
├── analysis.py # Analysis scripts
├── requirements.txt # Project dependencies
└── README.md # This file
Each variant in variants/ is self-contained with:
* config.yaml – structured hyper-parameters
* train_all_variants.sh – pipeline execution script
Running an Experiment
```bash
1. Set up environment
python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt
2. Run a pipeline variant
cd variants/v0_baseline # or any other variant python3 train.py ```
All energy, carbon and performance numbers are saved under analysis_results/.
Energy & Carbon Instrumentation
| Layer | Tool | What it Measures | |-------|------|------------------| | CPU/GPU/RAM system | CodeCarbon | Process-level energy + g CO₂/kWh |
Each stage starts an energy session; deltas are aggregated into the final analysis.
Result Files
A run produces:
<VARIANT>/results/*
You can also find the analyzed and combined result along with plots in analysis_results/.
jsonc
{
"variant": "v2_lora_peft",
"energy_kwh": {
"data": 0.14,
"architecture": 0.00,
"training": 1.92,
"system": 0.08,
"inference": 0.37,
"total": 2.51
},
"co2_kg": 0.98,
"accuracy": 0.842,
"latency_ms": 23.5
}
* analysis_results/aggregated_metrics.json – comprehensive analysis across all variants
* Training progress logs under training_progress.log
Owner
- Name: SMART-Dal
- Login: SMART-Dal
- Kind: organization
- Repositories: 2
- Profile: https://github.com/SMART-Dal
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "How to Tune Your AI Green?"
version: "0.1.0"
url: "https://github.com/SMART-Dal/tuning-green-ai-pipelines"
date-released: 2025-06-12
authors:
- family-names: Rajput
given-names: Saurabhsingh
email: saurabh@dal.ca
affiliation: "Dalhousie University, Halifax, Canada"
orcid: "https://orcid.org/0000-0002-4630-2288"
- family-names: Saad
given-names: Mootez
email: mootez@dal.ca
affiliation: "Dalhousie University, Halifax, Canada"
orcid: "https://orcid.org/0009-0008-8159-3632"
- family-names: Sharma
given-names: Tushar
email: tushar@dal.ca
affiliation: "Dalhousie University, Halifax, Canada"
orcid: "https://orcid.org/0000-0002-0538-052X"
GitHub Events
Total
- Release event: 1
- Public event: 1
- Push event: 2
- Create event: 1
Last Year
- Release event: 1
- Public event: 1
- Push event: 2
- Create event: 1
Dependencies
- accelerate >=0.20.0
- bitsandbytes >=0.41.0
- black >=22.0.0
- codecarbon >=2.2.0
- datasets >=2.12.0
- flake8 >=4.0.0
- gputil >=1.4.0
- matplotlib >=3.5.0
- numpy >=1.24.0
- omegaconf >=2.3.0
- pandas >=1.5.0
- peft >=0.4.0
- protobuf >=3.20.0
- psutil >=5.9.0
- pynvml >=11.5.0
- pytest >=7.0.0
- ruamel.yaml *
- scikit-learn >=1.0.0
- seaborn >=0.12.0
- sentencepiece >=0.1.99
- torch >=2.0.0
- tqdm >=4.65.0
- transformers >=4.30.0
- vllm >=0.2.0
- wandb >=0.15.0