Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.0%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: SMART-Dal
  • Language: Python
  • Default Branch: main
  • Size: 26.5 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 12 months ago · Last pushed 9 months ago
Metadata Files
Readme Citation

README.md

♻️ Tu(r)ning AI Green

Quantifying how energy-efficient techniques applied at different stages of an AI pipeline interact, stack, and occasionally collide.


Table of Contents

  1. Project Goals
  2. Supported Tasks & Models
  3. Repository Layout
  4. Running an Experiment
  5. Energy & Carbon Instrumentation
  6. Result Files

Project Goals

Large-scale AI systems burn substantial energy across five canonical stages:

  1. Data preparation
  2. Model Architecture design / selection
  3. Training or fine-tuning
  4. System-level deployment
  5. Inference

This study builds multiple variants of the same end-to-end pipeline, each enabling energy-saving techniques in one or more of those stages. By comparing energy usage, carbon footprint, latency and task accuracy across variants, we ask:

Do savings add up linearly, cancel one another out, or compound super-linearly?
Does "optimise everywhere" always beat "optimise the bottleneck"?


Supported Tasks & Models

| Task | Dataset | Model(s) | Notes | |------|---------|----------|-------| | Vulnerability Detection | BigVul | ModernBERT | Binary vulnerability classification |

Each pipeline variant re-uses the same datasets & model checkpoints so that only the energy-efficiency knobs differ.


Repository Layout

green-pipeline-study/ ├── variants/ # One folder per experimental pipeline │ ├── v0_baseline/ # Baseline implementation │ ├── v1_gradient_checkpointing/ │ ├── v2_lora_peft/ # Parameter efficient fine-tuning │ ├── v3_quantization/ # Model quantization │ ├── v4_tokenizer/ # Tokenizer optimizations │ ├── v5_power_limit_100W/ # Power limiting │ ├── v6_optimizer/ # Optimizer configurations │ ├── v7_f16/ # FP16 precision │ ├── v8_sequence_length_trimming/ │ ├── v9_inference_engine/ │ ├── v10_dataloader_pin_memory/ │ ├── v11_torch_compile/ │ ├── v12_attention/ │ ├── v13_layer_pruning_4_top/ │ └── ... # Additional variants │ ├── common/ # Shared components │ ├── layer_drop.py # Layer pruning utilities │ └── generate_configs.py # Configuration generation │ ├── analysis_results/ # Analysis outputs ├── energy_modelling.py # Energy modeling utilities ├── analysis.py # Analysis scripts ├── requirements.txt # Project dependencies └── README.md # This file

Each variant in variants/ is self-contained with: * config.yaml – structured hyper-parameters * train_all_variants.sh – pipeline execution script


Running an Experiment

```bash

1. Set up environment

python -m venv .venv && source .venv/bin/activate pip install -r requirements.txt

2. Run a pipeline variant

cd variants/v0_baseline # or any other variant python3 train.py ```

All energy, carbon and performance numbers are saved under analysis_results/.


Energy & Carbon Instrumentation

| Layer | Tool | What it Measures | |-------|------|------------------| | CPU/GPU/RAM system | CodeCarbon | Process-level energy + g CO₂/kWh |

Each stage starts an energy session; deltas are aggregated into the final analysis.


Result Files

A run produces:

  • <VARIANT>/results/*

You can also find the analyzed and combined result along with plots in analysis_results/.

jsonc { "variant": "v2_lora_peft", "energy_kwh": { "data": 0.14, "architecture": 0.00, "training": 1.92, "system": 0.08, "inference": 0.37, "total": 2.51 }, "co2_kg": 0.98, "accuracy": 0.842, "latency_ms": 23.5 } * analysis_results/aggregated_metrics.json – comprehensive analysis across all variants * Training progress logs under training_progress.log

Owner

  • Name: SMART-Dal
  • Login: SMART-Dal
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "How to Tune Your AI Green?" 
version: "0.1.0"
url: "https://github.com/SMART-Dal/tuning-green-ai-pipelines"
date-released: 2025-06-12   
authors:
  - family-names: Rajput
    given-names: Saurabhsingh
    email: saurabh@dal.ca
    affiliation: "Dalhousie University, Halifax, Canada"
    orcid: "https://orcid.org/0000-0002-4630-2288"

  - family-names: Saad
    given-names: Mootez
    email: mootez@dal.ca
    affiliation: "Dalhousie University, Halifax, Canada"
    orcid: "https://orcid.org/0009-0008-8159-3632"

  - family-names: Sharma
    given-names: Tushar
    email: tushar@dal.ca
    affiliation: "Dalhousie University, Halifax, Canada"
    orcid: "https://orcid.org/0000-0002-0538-052X"

GitHub Events

Total
  • Release event: 1
  • Public event: 1
  • Push event: 2
  • Create event: 1
Last Year
  • Release event: 1
  • Public event: 1
  • Push event: 2
  • Create event: 1

Dependencies

requirements.txt pypi
  • accelerate >=0.20.0
  • bitsandbytes >=0.41.0
  • black >=22.0.0
  • codecarbon >=2.2.0
  • datasets >=2.12.0
  • flake8 >=4.0.0
  • gputil >=1.4.0
  • matplotlib >=3.5.0
  • numpy >=1.24.0
  • omegaconf >=2.3.0
  • pandas >=1.5.0
  • peft >=0.4.0
  • protobuf >=3.20.0
  • psutil >=5.9.0
  • pynvml >=11.5.0
  • pytest >=7.0.0
  • ruamel.yaml *
  • scikit-learn >=1.0.0
  • seaborn >=0.12.0
  • sentencepiece >=0.1.99
  • torch >=2.0.0
  • tqdm >=4.65.0
  • transformers >=4.30.0
  • vllm >=0.2.0
  • wandb >=0.15.0