caraml

CARAML Benchmark Suite

https://github.com/fzj-jsc/caraml

Science Score: 75.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, ieee.org
  • Academic email domains
  • Institutional organization owner
    Organization fzj-jsc has institutional domain (www.fz-juelich.de)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.3%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

CARAML Benchmark Suite

Basic Info
  • Host: GitHub
  • Owner: FZJ-JSC
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 10.5 MB
Statistics
  • Stars: 1
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 1
Created almost 2 years ago · Last pushed 12 months ago
Metadata Files
Readme License Citation

README.md

CARAML

Compact Automated Reproducible Assessment of Machine Learning (CARAML) is a benchmark framework designed to assess AI workloads on novel accelerators. It has been developed and tested extensively on systems at the Jülich Supercomputing Centre (JSC).

CARAML leverages JUBE, a scripting-based framework for creating benchmark sets, running them across different systems, and evaluating results. Additionally, it includes power/energy measurements through the jpwr tool.

Paper: Arxiv, IEEE

Tested Accelerators

CARAML has been tested on the JURECA-DC EVALUATION PLATFORM, JURECA-DC, JEDI, WEST-AI Nodes and NHR-FAU. These include the accelerators:

markdown | System | Configuration | Tag | |---------------------------------------------------|---------------------------------------------------|-----------| | NVIDIA Ampere node (SXM) | 4 × A100 (40GB HBM2e) GPUs | `A100` | | NVIDIA Hopper node (PCIe) | 4 × H100 (80GB HBM2e) GPUs | `H100` | | NVIDIA Hopper node (NVLink) | 4 × H100 (94GB HBM2e) GPUs | `WAIH100` | | NVIDIA Grace-Hopper chip | 1 × GH200 (480GB LPDDR5X, 96GB HBM3) GPU | `GH200` | | NVIDIA Grace-Hopper node | 4 × GH200 (120GB LPDDR5X, 96GB HBM3) GPUs | `JUPITER` | | AMD MI300X node | 8 × MI300X (192GB HBM3) GPUs | `MI300X` | | AMD MI300A node | 4 × MI300A (128GB HBM3) APUs | `MI300A` | | AMD MI200 node | 4 × MI250 (128GB HBM2e) GPUs | `MI250` | | Graphcore IPU-POD4 M2000 | 4 × GC200 (512GB DDR4-3200) IPUs | `GC200` |

Benchmark

CARAML currently provides benchmarks implemented in Python:

1. Computer Vision: Image Classification (Training)

The image_classification model training benchmark is implemented in PyTorch. It is designed to test image classification models such as ResNet50 on various accelerators. For IPU's graphcore/examples is used.

Performance is measured in images/s and energy is measured in Wh.

Note: Support for the image classification benchmark in TensorFlow has been discontinued.

2. Natural Language Processing: GPT Language Model (Training)

The LLM-training benchmark is implemented in PyTorch with: - Megatron-LM with commit: f7727433293427bef04858f67b2889fe9b177d88 and patch applied for NVIDIA - Megatron-LM-ROCm with commit: 21045b59127cd2d5509f1ca27d81fae7b485bd22 and patch applied for AMD - graphcore/examples (forked version) for Graphcore

Performance is measured in tokens/s and energy is recorded in Wh.

Requirements

To run the benchmarks, install JUBE following JUBE Installation Documentation setup instructions. The benchmarks are deployed using Apptainer containers and executed using SLURM on the tested accelerators.

Dataset

  • Image Classification: Synthetic data is generated on the host machine for benchmarking. The IPU tag synthetic additionally allows for the generation of synthetic data directly on the IPU.

  • LLM Training: A subset of the OSCAR dataset (790 samples, ~10 MB) is pre-processed using GPT-2 tokenizers. This data is provided in the llm_data directory.

Execution

  • Clone the repository and navigate into it:

bash git clone https://github.com/FZJ-JSC/CARAML.git cd CARAML

  • Modify the system and model parameters in the respective JUBE configuration file.
  • To pull the required container use the container tag as follows: bash jube run {JUBEConfig}.{xml,yaml} --tag container H100 Replace H100 with one of the following as needed:
    • GH200 (for Arm CPU + H100)
    • MI250 or MI300X or MI300A (for AMD)
    • GC200 (for Graphcore) > Note: The container tag should ideally be used only once at the beginning to pull and set up the container.

Image Classification (Training)

  • To run the benchmark with defined configurations do bash jube run image_classification/image_classification_torch_benchmark.xml --tag H100

    H100 can be replaced with any tag mentioned in tested accelerators section.

  • After the benchmark has been executed, use jube continue to postprocess results bash jube continue image_classification/image_classification_torch_benchmark_run -i last

  • To generate result do: bash jube result image_classification/image_classification_torch_benchmark_run -i last

LLM Training

  • To run the benchmark with defined configurations for 800M GPT model with OSCAR data do: bash jube run llm_training/llm_benchmark_nvidia_amd.yaml --tag 800M A100 A100 can be replaced with any tag mentioned in tested accelerators section and 800M can be replaced with 13B and 175B for systems with more node resources.

  • To run the benchmark with defined configurations for 117M GPT model on Graphcore with synthetic data do bash jube run llm_training/llm_benchmark_ipu.yaml --tag 117M synthetic If tag synthetic is not given, the benchmark will use OSCAR data.

  • After the benchmark has been executed, use jube continue to postprocess results bash jube continue llm_training/llm_benchmark_{nvidia_amd,ipu}_run -i last

  • To generate result do: bash jube result llm_training/llm_benchmark_{nvidia_amd,ipu}_run -i last

Results

Image Classsification: ResNet50 LLM Training Benchmark

JSC Specific Fixes

In order to use PyTorch torch run API on JSC systems fixedtorchrun.py fix is required. The fix solves the issue defined here.

Additionally the hostname is appended with an i for allowing communication over InfiniBand as described here.

Citation

``` @INPROCEEDINGS{10820809, author={John, Chelsea Maria and Nassyr, Stepan and Penke, Carolin and Herten, Andreas}, booktitle={SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis}, title={Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML}, year={2024}, pages={1164-1176}, doi={10.1109/SCW63240.2024.00158} }

```

Owner

  • Name: Jülich Supercomputing Centre
  • Login: FZJ-JSC
  • Kind: organization
  • Location: Germany

Jülich Supercomputing Centre provides HPC resources and expertise. Part of Forschungszentrum Jülich.

Citation (CITATION.cff)

cff-version: 1.2.0
title: "CARAML"
message: "In addition to citing this benchmark repository, please also cite the accompanying SC24 paper."
authors:
  - given-names: "Chelsea"
    family-names: "John"
    affiliation: "Forschungszentrum Jülich, Jülich Supercomputing Centre"
    orcid: "https://orcid.org/0000-0003-3777-7393"
repository-code: "https://github.com/FZJ-JSC/CARAML"
license: "MIT"
date-released: "2024-07-26"
references:
  - type: conference-paper
    citation-key: "SC24-Paper"
    authors:
      - given-names: "Chelsea"
        family-names: "John"
        affiliation: "Forschungszentrum Jülich, Jülich Supercomputing Centre"
        orcid: "https://orcid.org/0000-0003-3777-7393"
      - given-names: "Carolin"
        family-names: "Penke"
        affiliation: "Forschungszentrum Jülich, Jülich Supercomputing Centre"
        orcid: "https://orcid.org/0000-0002-4043-3885"
      - given-names: "Stepan"
        family-names: "Nassyr"
        affiliation: "Forschungszentrum Jülich, Jülich Supercomputing Centre"
        orcid: "https://orcid.org/0000-0002-0035-244X"
      - given-names: "Andreas"
        family-names: "Herten"
        affiliation: "Forschungszentrum Jülich, Jülich Supercomputing Centre"
        orcid: "https://orcid.org/0000-0002-7150-2505"
    doi: "10.1109/SCW63240.2024.00158"
    booktitle: "SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis"
    start: 1164
    end: 1176
    title: "Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML"
    year: 2024

GitHub Events

Total
  • Release event: 1
  • Delete event: 2
  • Issue comment event: 1
  • Push event: 27
  • Pull request event: 11
  • Fork event: 1
  • Create event: 5
Last Year
  • Release event: 1
  • Delete event: 2
  • Issue comment event: 1
  • Push event: 27
  • Pull request event: 11
  • Fork event: 1
  • Create event: 5

Dependencies

requirements/amd_tensorflow_requirements.txt pypi
  • absl-py ==2.1.0
  • astunparse ==1.6.3
  • cachetools ==5.4.0
  • flatbuffers ==23.5.26
  • gast ==0.4.0
  • google-auth ==2.16.0
  • keras-preprocessing ==1.1.2
  • numpy ==1.26.4
  • opt-einsum ==3.3.0
  • pandas ==2.2.1
  • protobuf ==3.19.6
  • pyasn1 ==0.6.0
  • pyasn1-modules ==0.4.0
  • python-dateutil ==2.9.0.post0
  • pytz ==2024.1
  • rsa ==4.9
  • six ==1.16.0
  • termcolor ==2.2.0
  • tzdata ==2024.1
  • wheel ==0.43.0
  • wrapt ==1.14.1
requirements/amd_torch_requirements.txt pypi
  • accelerate ==0.33.0
  • annotated-types ==0.7.0
  • einops ==0.8.0
  • hjson ==3.1.0
  • huggingface-hub ==0.24.5
  • matplotlib ==3.9.0
  • ninja ==1.11.1.1
  • pandas ==1.2.4
  • pybind11 ==2.13.1
  • pydantic ==2.8.2
  • pydantic-core ==2.20.1
  • pyparsing ==3.1.2
  • python-slugify ==8.0.4
  • pytz ==2024.1
  • regex ==2024.7.24
  • safetensors ==0.4.4
  • tabulate ==0.9.0
  • tokenizers ==0.19.1
  • transformers ==4.43.4
requirements/ipu_tensorflow_requirements.txt pypi
  • GitPython ==3.1.31
  • Mako ==1.2.4
  • MarkupSafe ==2.1.2
  • PyYAML ==5.4.1
  • appdirs ==1.4.4
  • attrs ==22.2.0
  • autopep8 ==1.6.0
  • awscli ==1.27.94
  • botocore ==1.29.94
  • cffi ==1.15.1
  • click ==8.1.3
  • cloudpickle ==2.2.1
  • colorama ==0.4.4
  • coverage ==7.2.2
  • cppimport ==22.8.2
  • dill ==0.3.6
  • docker-pycreds ==0.4.0
  • docutils ==0.16
  • execnet ==1.9.0
  • filelock ==3.10.0
  • gitdb ==4.0.10
  • googleapis-common-protos ==1.58.0
  • horovod ==0.27.0
  • importlib-metadata ==6.1.0
  • importlib-resources ==5.12.0
  • iniconfig ==2.0.0
  • jmespath ==1.0.1
  • opencv-python-headless ==4.6.0.66
  • packaging ==23.0
  • pandas ==1.0.3
  • pathtools ==0.1.2
  • pluggy ==1.0.0
  • promise ==2.3
  • psutil ==5.9.4
  • py ==1.11.0
  • pyasn1 ==0.4.8
  • pybind11 ==2.10.4
  • pycodestyle ==2.10.0
  • pycparser ==2.21
  • pytest ==6.2.5
  • pytest-cov ==3.0.0
  • pytest-forked ==1.4.0
  • pytest-pythonpath ==0.7.4
  • pytest-xdist ==2.5.0
  • python-dateutil ==2.9.0.post0
  • pytz ==2024.1
  • rsa ==4.7.2
  • s3transfer ==0.6.0
  • sentry-sdk ==1.17.0
  • setproctitle ==1.3.2
  • six ==1.16.0
  • smmap ==5.0.0
  • tensorflow-addons ==0.14.0
  • tensorflow-datasets ==4.5.2
  • tensorflow-metadata ==1.12.0
  • toml ==0.10.2
  • tomli ==2.0.1
  • tqdm ==4.65.0
  • typeguard ==3.0.1
  • typing_extensions ==4.5.0
  • tzdata ==2024.1
  • urllib3 ==1.26.15
  • wandb ==0.14.0
  • zipp ==3.15.0
requirements/ipu_torch_requirements.txt pypi
  • GitPython ==3.1.43
  • Mako ==1.3.5
  • MarkupSafe ==2.1.5
  • PyTurboJPEG ==1.7.7
  • PyYAML ==6.0.1
  • attrs ==23.2.0
  • awscli ==1.33.32
  • botocore ==1.34.150
  • certifi ==2024.7.4
  • charset-normalizer ==3.3.2
  • click ==8.1.7
  • cloudpickle ==3.0.0
  • colorama ==0.4.6
  • cppimport ==22.8.2
  • datasets ==2.1.0
  • docutils ==0.16
  • fastjsonschema ==2.20.0
  • filelock ==3.15.4
  • fsspec ==2024.6.1
  • gitdb ==4.0.11
  • horovod ==0.28.1
  • huggingface-hub ==0.24.3
  • idna ==2.7
  • importlib-resources ==6.4.0
  • jmespath ==1.0.1
  • jsonschema ==4.23.0
  • jsonschema-specifications ==2023.12.1
  • jupyter-core ==5.7.2
  • mypy-extensions ==1.0.0
  • nbformat ==5.10.4
  • numpy ==1.24.4
  • packaging ==24.1
  • pandas ==2.0.3
  • pathtools ==0.1.2
  • pkgutil-resolve-name ==1.3.10
  • platformdirs ==4.2.2
  • promise ==2.3
  • protobuf ==3.20.
  • psutil ==6.0.0
  • pyasn1 ==0.6.0
  • pybind11 ==2.13.1
  • pytest ==6.2.5
  • pytest-pythonpath ==0.7.3
  • python-dateutil ==2.9.0
  • referencing ==0.35.1
  • regex ==2024.7.24
  • requests ==2.32.3
  • rpds-py ==0.19.1
  • rsa ==4.7.2
  • s3transfer ==0.10.2
  • scipy ==1.5.4
  • sentry-sdk ==2.11.0
  • shortuuid ==1.0.13
  • simple-parsing ==0.0.19
  • six ==1.16.0
  • smmap ==5.0.1
  • tfrecord ==1.14.1
  • timm ==1.0.15
  • tokenizers ==0.12.1
  • tqdm ==4.63.1
  • traitlets ==5.14.3
  • transformers ==4.26.1
  • typing-extensions ==4.12.2
  • typing-inspect ==0.9.0
  • tzdata ==2024.1
  • urllib3 ==2.2.2
  • wandb ==0.12.8
  • zipp ==3.19.2
requirements/nvidia_arm_tensorflow_requirements.txt pypi
requirements/nvidia_arm_torch_requirements.txt pypi
  • python-slugify ==8.0.4
requirements/nvidia_x86_tensorflow_requirements.txt pypi
requirements/nvidia_x86_torch_requirements.txt pypi
  • python-slugify ==8.0.4