caraml

CARAML Benchmark Suite

https://github.com/fzj-jsc/caraml

Science Score: 75.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, ieee.org
○
Academic email domains
✓
Institutional organization owner
Organization fzj-jsc has institutional domain (www.fz-juelich.de)
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.3%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

CARAML Benchmark Suite

Basic Info

Host: GitHub
Owner: FZJ-JSC
License: mit
Language: Python
Default Branch: main
Size: 10.5 MB

Statistics

Stars: 1
Watchers: 2
Forks: 1
Open Issues: 0
Releases: 1

Created about 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

CARAML

Compact Automated Reproducible Assessment of Machine Learning (CARAML) is a benchmark framework designed to assess AI workloads on novel accelerators. It has been developed and tested extensively on systems at the Jülich Supercomputing Centre (JSC).

CARAML leverages JUBE, a scripting-based framework for creating benchmark sets, running them across different systems, and evaluating results. Additionally, it includes power/energy measurements through the jpwr tool.

Paper: Arxiv, IEEE

Tested Accelerators

CARAML has been tested on the JURECA-DC EVALUATION PLATFORM, JURECA-DC, JEDI, WEST-AI Nodes and NHR-FAU. These include the accelerators:

markdown | System | Configuration | Tag | |---------------------------------------------------|---------------------------------------------------|-----------| | NVIDIA Ampere node (SXM) | 4 × A100 (40GB HBM2e) GPUs | `A100` | | NVIDIA Hopper node (PCIe) | 4 × H100 (80GB HBM2e) GPUs | `H100` | | NVIDIA Hopper node (NVLink) | 4 × H100 (94GB HBM2e) GPUs | `WAIH100` | | NVIDIA Grace-Hopper chip | 1 × GH200 (480GB LPDDR5X, 96GB HBM3) GPU | `GH200` | | NVIDIA Grace-Hopper node | 4 × GH200 (120GB LPDDR5X, 96GB HBM3) GPUs | `JUPITER` | | AMD MI300X node | 8 × MI300X (192GB HBM3) GPUs | `MI300X` | | AMD MI300A node | 4 × MI300A (128GB HBM3) APUs | `MI300A` | | AMD MI200 node | 4 × MI250 (128GB HBM2e) GPUs | `MI250` | | Graphcore IPU-POD4 M2000 | 4 × GC200 (512GB DDR4-3200) IPUs | `GC200` |

Benchmark

CARAML currently provides benchmarks implemented in Python:

1. Computer Vision: Image Classification (Training)

The image_classification model training benchmark is implemented in PyTorch. It is designed to test image classification models such as ResNet50 on various accelerators. For IPU's graphcore/examples is used.

Performance is measured in images/s and energy is measured in Wh.

Note: Support for the image classification benchmark in TensorFlow has been discontinued.

2. Natural Language Processing: GPT Language Model (Training)

The LLM-training benchmark is implemented in PyTorch with: - Megatron-LM with commit: f7727433293427bef04858f67b2889fe9b177d88 and patch applied for NVIDIA - Megatron-LM-ROCm with commit: 21045b59127cd2d5509f1ca27d81fae7b485bd22 and patch applied for AMD - graphcore/examples (forked version) for Graphcore

Performance is measured in tokens/s and energy is recorded in Wh.

Requirements

To run the benchmarks, install JUBE following JUBE Installation Documentation setup instructions. The benchmarks are deployed using Apptainer containers and executed using SLURM on the tested accelerators.

Dataset

Image Classification: Synthetic data is generated on the host machine for benchmarking. The IPU tag synthetic additionally allows for the generation of synthetic data directly on the IPU.
LLM Training: A subset of the OSCAR dataset (790 samples, ~10 MB) is pre-processed using GPT-2 tokenizers. This data is provided in the llm_data directory.

Execution

Clone the repository and navigate into it:

bash git clone https://github.com/FZJ-JSC/CARAML.git cd CARAML

Modify the system and model parameters in the respective JUBE configuration file.
To pull the required container use the container tag as follows: bash jube run {JUBEConfig}.{xml,yaml} --tag container H100 Replace H100 with one of the following as needed:
- GH200 (for Arm CPU + H100)
- MI250 or MI300X or MI300A (for AMD)
- GC200 (for Graphcore) > Note: The container tag should ideally be used only once at the beginning to pull and set up the container.

Image Classification (Training)

To run the benchmark with defined configurations do bash jube run image_classification/image_classification_torch_benchmark.xml --tag H100

H100 can be replaced with any tag mentioned in tested accelerators section.
After the benchmark has been executed, use jube continue to postprocess results bash jube continue image_classification/image_classification_torch_benchmark_run -i last
To generate result do: bash jube result image_classification/image_classification_torch_benchmark_run -i last

LLM Training

To run the benchmark with defined configurations for 800M GPT model with OSCAR data do: bash jube run llm_training/llm_benchmark_nvidia_amd.yaml --tag 800M A100 A100 can be replaced with any tag mentioned in tested accelerators section and 800M can be replaced with 13B and 175B for systems with more node resources.
To run the benchmark with defined configurations for 117M GPT model on Graphcore with synthetic data do bash jube run llm_training/llm_benchmark_ipu.yaml --tag 117M synthetic If tag synthetic is not given, the benchmark will use OSCAR data.
After the benchmark has been executed, use jube continue to postprocess results bash jube continue llm_training/llm_benchmark_{nvidia_amd,ipu}_run -i last
To generate result do: bash jube result llm_training/llm_benchmark_{nvidia_amd,ipu}_run -i last

Results

Image Classsification: ResNet50 LLM Training Benchmark

JSC Specific Fixes

In order to use PyTorch torch run API on JSC systems fixedtorchrun.py fix is required. The fix solves the issue defined here.

Additionally the hostname is appended with an i for allowing communication over InfiniBand as described here.

Citation

``` @INPROCEEDINGS{10820809, author={John, Chelsea Maria and Nassyr, Stepan and Penke, Carolin and Herten, Andreas}, booktitle={SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis}, title={Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML}, year={2024}, pages={1164-1176}, doi={10.1109/SCW63240.2024.00158} }

```

Owner

Name: Jülich Supercomputing Centre
Login: FZJ-JSC
Kind: organization
Location: Germany

Website: https://www.fz-juelich.de/en/ias/jsc
Twitter: fzj_jsc
Repositories: 29
Profile: https://github.com/FZJ-JSC

Jülich Supercomputing Centre provides HPC resources and expertise. Part of Forschungszentrum Jülich.

Citation (CITATION.cff)

cff-version: 1.2.0
title: "CARAML"
message: "In addition to citing this benchmark repository, please also cite the accompanying SC24 paper."
authors:
  - given-names: "Chelsea"
    family-names: "John"
    affiliation: "Forschungszentrum Jülich, Jülich Supercomputing Centre"
    orcid: "https://orcid.org/0000-0003-3777-7393"
repository-code: "https://github.com/FZJ-JSC/CARAML"
license: "MIT"
date-released: "2024-07-26"
references:
  - type: conference-paper
    citation-key: "SC24-Paper"
    authors:
      - given-names: "Chelsea"
        family-names: "John"
        affiliation: "Forschungszentrum Jülich, Jülich Supercomputing Centre"
        orcid: "https://orcid.org/0000-0003-3777-7393"
      - given-names: "Carolin"
        family-names: "Penke"
        affiliation: "Forschungszentrum Jülich, Jülich Supercomputing Centre"
        orcid: "https://orcid.org/0000-0002-4043-3885"
      - given-names: "Stepan"
        family-names: "Nassyr"
        affiliation: "Forschungszentrum Jülich, Jülich Supercomputing Centre"
        orcid: "https://orcid.org/0000-0002-0035-244X"
      - given-names: "Andreas"
        family-names: "Herten"
        affiliation: "Forschungszentrum Jülich, Jülich Supercomputing Centre"
        orcid: "https://orcid.org/0000-0002-7150-2505"
    doi: "10.1109/SCW63240.2024.00158"
    booktitle: "SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis"
    start: 1164
    end: 1176
    title: "Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML"
    year: 2024

GitHub Events

Total

Release event: 1
Delete event: 2
Issue comment event: 1
Push event: 27
Pull request event: 11
Fork event: 1
Create event: 5

Last Year

Release event: 1
Delete event: 2
Issue comment event: 1
Push event: 27
Pull request event: 11
Fork event: 1
Create event: 5

Dependencies

requirements/amd_tensorflow_requirements.txt pypi

absl-py ==2.1.0
astunparse ==1.6.3
cachetools ==5.4.0
flatbuffers ==23.5.26
gast ==0.4.0
google-auth ==2.16.0
keras-preprocessing ==1.1.2
numpy ==1.26.4
opt-einsum ==3.3.0
pandas ==2.2.1
protobuf ==3.19.6
pyasn1 ==0.6.0
pyasn1-modules ==0.4.0
python-dateutil ==2.9.0.post0
pytz ==2024.1
rsa ==4.9
six ==1.16.0
termcolor ==2.2.0
tzdata ==2024.1
wheel ==0.43.0
wrapt ==1.14.1

requirements/amd_torch_requirements.txt pypi

accelerate ==0.33.0
annotated-types ==0.7.0
einops ==0.8.0
hjson ==3.1.0
huggingface-hub ==0.24.5
matplotlib ==3.9.0
ninja ==1.11.1.1
pandas ==1.2.4
pybind11 ==2.13.1
pydantic ==2.8.2
pydantic-core ==2.20.1
pyparsing ==3.1.2
python-slugify ==8.0.4
pytz ==2024.1
regex ==2024.7.24
safetensors ==0.4.4
tabulate ==0.9.0
tokenizers ==0.19.1
transformers ==4.43.4

requirements/ipu_tensorflow_requirements.txt pypi

GitPython ==3.1.31
Mako ==1.2.4
MarkupSafe ==2.1.2
PyYAML ==5.4.1
appdirs ==1.4.4
attrs ==22.2.0
autopep8 ==1.6.0
awscli ==1.27.94
botocore ==1.29.94
cffi ==1.15.1
click ==8.1.3
cloudpickle ==2.2.1
colorama ==0.4.4
coverage ==7.2.2
cppimport ==22.8.2
dill ==0.3.6
docker-pycreds ==0.4.0
docutils ==0.16
execnet ==1.9.0
filelock ==3.10.0
gitdb ==4.0.10
googleapis-common-protos ==1.58.0
horovod ==0.27.0
importlib-metadata ==6.1.0
importlib-resources ==5.12.0
iniconfig ==2.0.0
jmespath ==1.0.1
opencv-python-headless ==4.6.0.66
packaging ==23.0
pandas ==1.0.3
pathtools ==0.1.2
pluggy ==1.0.0
promise ==2.3
psutil ==5.9.4
py ==1.11.0
pyasn1 ==0.4.8
pybind11 ==2.10.4
pycodestyle ==2.10.0
pycparser ==2.21
pytest ==6.2.5
pytest-cov ==3.0.0
pytest-forked ==1.4.0
pytest-pythonpath ==0.7.4
pytest-xdist ==2.5.0
python-dateutil ==2.9.0.post0
pytz ==2024.1
rsa ==4.7.2
s3transfer ==0.6.0
sentry-sdk ==1.17.0
setproctitle ==1.3.2
six ==1.16.0
smmap ==5.0.0
tensorflow-addons ==0.14.0
tensorflow-datasets ==4.5.2
tensorflow-metadata ==1.12.0
toml ==0.10.2
tomli ==2.0.1
tqdm ==4.65.0
typeguard ==3.0.1
typing_extensions ==4.5.0
tzdata ==2024.1
urllib3 ==1.26.15
wandb ==0.14.0
zipp ==3.15.0

requirements/ipu_torch_requirements.txt pypi

GitPython ==3.1.43
Mako ==1.3.5
MarkupSafe ==2.1.5
PyTurboJPEG ==1.7.7
PyYAML ==6.0.1
attrs ==23.2.0
awscli ==1.33.32
botocore ==1.34.150
certifi ==2024.7.4
charset-normalizer ==3.3.2
click ==8.1.7
cloudpickle ==3.0.0
colorama ==0.4.6
cppimport ==22.8.2
datasets ==2.1.0
docutils ==0.16
fastjsonschema ==2.20.0
filelock ==3.15.4
fsspec ==2024.6.1
gitdb ==4.0.11
horovod ==0.28.1
huggingface-hub ==0.24.3
idna ==2.7
importlib-resources ==6.4.0
jmespath ==1.0.1
jsonschema ==4.23.0
jsonschema-specifications ==2023.12.1
jupyter-core ==5.7.2
mypy-extensions ==1.0.0
nbformat ==5.10.4
numpy ==1.24.4
packaging ==24.1
pandas ==2.0.3
pathtools ==0.1.2
pkgutil-resolve-name ==1.3.10
platformdirs ==4.2.2
promise ==2.3
protobuf ==3.20.
psutil ==6.0.0
pyasn1 ==0.6.0
pybind11 ==2.13.1
pytest ==6.2.5
pytest-pythonpath ==0.7.3
python-dateutil ==2.9.0
referencing ==0.35.1
regex ==2024.7.24
requests ==2.32.3
rpds-py ==0.19.1
rsa ==4.7.2
s3transfer ==0.10.2
scipy ==1.5.4
sentry-sdk ==2.11.0
shortuuid ==1.0.13
simple-parsing ==0.0.19
six ==1.16.0
smmap ==5.0.1
tfrecord ==1.14.1
timm ==1.0.15
tokenizers ==0.12.1
tqdm ==4.63.1
traitlets ==5.14.3
transformers ==4.26.1
typing-extensions ==4.12.2
typing-inspect ==0.9.0
tzdata ==2024.1
urllib3 ==2.2.2
wandb ==0.12.8
zipp ==3.19.2

requirements/nvidia_arm_tensorflow_requirements.txt pypi

requirements/nvidia_arm_torch_requirements.txt pypi

python-slugify ==8.0.4

requirements/nvidia_x86_tensorflow_requirements.txt pypi

requirements/nvidia_x86_torch_requirements.txt pypi

python-slugify ==8.0.4

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science