Science Score: 75.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, ieee.org -
○Academic email domains
-
✓Institutional organization owner
Organization fzj-jsc has institutional domain (www.fz-juelich.de) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.3%) to scientific vocabulary
Repository
CARAML Benchmark Suite
Basic Info
- Host: GitHub
- Owner: FZJ-JSC
- License: mit
- Language: Python
- Default Branch: main
- Size: 10.5 MB
Statistics
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
CARAML
Compact Automated Reproducible Assessment of Machine Learning (CARAML) is a benchmark framework designed to assess AI workloads on novel accelerators. It has been developed and tested extensively on systems at the Jülich Supercomputing Centre (JSC).
CARAML leverages JUBE, a scripting-based framework for creating benchmark sets, running them across different systems, and evaluating results. Additionally, it includes power/energy measurements through the jpwr tool.
Tested Accelerators
CARAML has been tested on the JURECA-DC EVALUATION PLATFORM, JURECA-DC, JEDI, WEST-AI Nodes and NHR-FAU. These include the accelerators:
markdown
| System | Configuration | Tag |
|---------------------------------------------------|---------------------------------------------------|-----------|
| NVIDIA Ampere node (SXM) | 4 × A100 (40GB HBM2e) GPUs | `A100` |
| NVIDIA Hopper node (PCIe) | 4 × H100 (80GB HBM2e) GPUs | `H100` |
| NVIDIA Hopper node (NVLink) | 4 × H100 (94GB HBM2e) GPUs | `WAIH100` |
| NVIDIA Grace-Hopper chip | 1 × GH200 (480GB LPDDR5X, 96GB HBM3) GPU | `GH200` |
| NVIDIA Grace-Hopper node | 4 × GH200 (120GB LPDDR5X, 96GB HBM3) GPUs | `JUPITER` |
| AMD MI300X node | 8 × MI300X (192GB HBM3) GPUs | `MI300X` |
| AMD MI300A node | 4 × MI300A (128GB HBM3) APUs | `MI300A` |
| AMD MI200 node | 4 × MI250 (128GB HBM2e) GPUs | `MI250` |
| Graphcore IPU-POD4 M2000 | 4 × GC200 (512GB DDR4-3200) IPUs | `GC200` |
Benchmark
CARAML currently provides benchmarks implemented in Python:
1. Computer Vision: Image Classification (Training)
The image_classification model training benchmark is implemented in PyTorch. It is designed to test image classification models such as ResNet50 on various accelerators. For IPU's graphcore/examples is used.
Performance is measured in images/s and energy is measured in Wh.
Note: Support for the image classification benchmark in TensorFlow has been discontinued.
2. Natural Language Processing: GPT Language Model (Training)
The LLM-training benchmark is implemented in PyTorch with:
- Megatron-LM with commit: f7727433293427bef04858f67b2889fe9b177d88 and patch applied for NVIDIA
- Megatron-LM-ROCm with commit: 21045b59127cd2d5509f1ca27d81fae7b485bd22 and patch applied for AMD
- graphcore/examples (forked version) for Graphcore
Performance is measured in tokens/s and energy is recorded in Wh.
Requirements
To run the benchmarks, install JUBE following JUBE Installation Documentation setup instructions. The benchmarks are deployed using Apptainer containers and executed using SLURM on the tested accelerators.
Dataset
Image Classification: Synthetic data is generated on the host machine for benchmarking. The IPU tag
syntheticadditionally allows for the generation of synthetic data directly on the IPU.LLM Training: A subset of the OSCAR dataset (790 samples, ~10 MB) is pre-processed using GPT-2 tokenizers. This data is provided in the
llm_datadirectory.
Execution
- Clone the repository and navigate into it:
bash
git clone https://github.com/FZJ-JSC/CARAML.git
cd CARAML
- Modify the
systemandmodelparameters in the respective JUBE configuration file. - To pull the required container use the
containertag as follows:bash jube run {JUBEConfig}.{xml,yaml} --tag container H100ReplaceH100with one of the following as needed:GH200(for Arm CPU + H100)MI250orMI300XorMI300A(for AMD)GC200(for Graphcore) > Note: Thecontainertag should ideally be used only once at the beginning to pull and set up the container.
Image Classification (Training)
To run the benchmark with defined configurations do
bash jube run image_classification/image_classification_torch_benchmark.xml --tag H100H100can be replaced with any tag mentioned in tested accelerators section.After the benchmark has been executed, use
jube continueto postprocess resultsbash jube continue image_classification/image_classification_torch_benchmark_run -i lastTo generate result do:
bash jube result image_classification/image_classification_torch_benchmark_run -i last
LLM Training
To run the benchmark with defined configurations for
800MGPT model with OSCAR data do:bash jube run llm_training/llm_benchmark_nvidia_amd.yaml --tag 800M A100A100can be replaced with any tag mentioned in tested accelerators section and800Mcan be replaced with13Band175Bfor systems with more node resources.To run the benchmark with defined configurations for
117MGPT model on Graphcore with synthetic data dobash jube run llm_training/llm_benchmark_ipu.yaml --tag 117M syntheticIf tagsyntheticis not given, the benchmark will use OSCAR data.After the benchmark has been executed, use
jube continueto postprocess resultsbash jube continue llm_training/llm_benchmark_{nvidia_amd,ipu}_run -i lastTo generate result do:
bash jube result llm_training/llm_benchmark_{nvidia_amd,ipu}_run -i last
Results

JSC Specific Fixes
In order to use PyTorch torch run API on JSC systems fixedtorchrun.py fix is required. The fix solves the issue defined here.
Additionally the hostname is appended with an i for allowing communication over InfiniBand as described here.
Citation
``` @INPROCEEDINGS{10820809, author={John, Chelsea Maria and Nassyr, Stepan and Penke, Carolin and Herten, Andreas}, booktitle={SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis}, title={Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML}, year={2024}, pages={1164-1176}, doi={10.1109/SCW63240.2024.00158} }
```
Owner
- Name: Jülich Supercomputing Centre
- Login: FZJ-JSC
- Kind: organization
- Location: Germany
- Website: https://www.fz-juelich.de/en/ias/jsc
- Twitter: fzj_jsc
- Repositories: 29
- Profile: https://github.com/FZJ-JSC
Jülich Supercomputing Centre provides HPC resources and expertise. Part of Forschungszentrum Jülich.
Citation (CITATION.cff)
cff-version: 1.2.0
title: "CARAML"
message: "In addition to citing this benchmark repository, please also cite the accompanying SC24 paper."
authors:
- given-names: "Chelsea"
family-names: "John"
affiliation: "Forschungszentrum Jülich, Jülich Supercomputing Centre"
orcid: "https://orcid.org/0000-0003-3777-7393"
repository-code: "https://github.com/FZJ-JSC/CARAML"
license: "MIT"
date-released: "2024-07-26"
references:
- type: conference-paper
citation-key: "SC24-Paper"
authors:
- given-names: "Chelsea"
family-names: "John"
affiliation: "Forschungszentrum Jülich, Jülich Supercomputing Centre"
orcid: "https://orcid.org/0000-0003-3777-7393"
- given-names: "Carolin"
family-names: "Penke"
affiliation: "Forschungszentrum Jülich, Jülich Supercomputing Centre"
orcid: "https://orcid.org/0000-0002-4043-3885"
- given-names: "Stepan"
family-names: "Nassyr"
affiliation: "Forschungszentrum Jülich, Jülich Supercomputing Centre"
orcid: "https://orcid.org/0000-0002-0035-244X"
- given-names: "Andreas"
family-names: "Herten"
affiliation: "Forschungszentrum Jülich, Jülich Supercomputing Centre"
orcid: "https://orcid.org/0000-0002-7150-2505"
doi: "10.1109/SCW63240.2024.00158"
booktitle: "SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis"
start: 1164
end: 1176
title: "Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML"
year: 2024
GitHub Events
Total
- Release event: 1
- Delete event: 2
- Issue comment event: 1
- Push event: 27
- Pull request event: 11
- Fork event: 1
- Create event: 5
Last Year
- Release event: 1
- Delete event: 2
- Issue comment event: 1
- Push event: 27
- Pull request event: 11
- Fork event: 1
- Create event: 5
Dependencies
- absl-py ==2.1.0
- astunparse ==1.6.3
- cachetools ==5.4.0
- flatbuffers ==23.5.26
- gast ==0.4.0
- google-auth ==2.16.0
- keras-preprocessing ==1.1.2
- numpy ==1.26.4
- opt-einsum ==3.3.0
- pandas ==2.2.1
- protobuf ==3.19.6
- pyasn1 ==0.6.0
- pyasn1-modules ==0.4.0
- python-dateutil ==2.9.0.post0
- pytz ==2024.1
- rsa ==4.9
- six ==1.16.0
- termcolor ==2.2.0
- tzdata ==2024.1
- wheel ==0.43.0
- wrapt ==1.14.1
- accelerate ==0.33.0
- annotated-types ==0.7.0
- einops ==0.8.0
- hjson ==3.1.0
- huggingface-hub ==0.24.5
- matplotlib ==3.9.0
- ninja ==1.11.1.1
- pandas ==1.2.4
- pybind11 ==2.13.1
- pydantic ==2.8.2
- pydantic-core ==2.20.1
- pyparsing ==3.1.2
- python-slugify ==8.0.4
- pytz ==2024.1
- regex ==2024.7.24
- safetensors ==0.4.4
- tabulate ==0.9.0
- tokenizers ==0.19.1
- transformers ==4.43.4
- GitPython ==3.1.31
- Mako ==1.2.4
- MarkupSafe ==2.1.2
- PyYAML ==5.4.1
- appdirs ==1.4.4
- attrs ==22.2.0
- autopep8 ==1.6.0
- awscli ==1.27.94
- botocore ==1.29.94
- cffi ==1.15.1
- click ==8.1.3
- cloudpickle ==2.2.1
- colorama ==0.4.4
- coverage ==7.2.2
- cppimport ==22.8.2
- dill ==0.3.6
- docker-pycreds ==0.4.0
- docutils ==0.16
- execnet ==1.9.0
- filelock ==3.10.0
- gitdb ==4.0.10
- googleapis-common-protos ==1.58.0
- horovod ==0.27.0
- importlib-metadata ==6.1.0
- importlib-resources ==5.12.0
- iniconfig ==2.0.0
- jmespath ==1.0.1
- opencv-python-headless ==4.6.0.66
- packaging ==23.0
- pandas ==1.0.3
- pathtools ==0.1.2
- pluggy ==1.0.0
- promise ==2.3
- psutil ==5.9.4
- py ==1.11.0
- pyasn1 ==0.4.8
- pybind11 ==2.10.4
- pycodestyle ==2.10.0
- pycparser ==2.21
- pytest ==6.2.5
- pytest-cov ==3.0.0
- pytest-forked ==1.4.0
- pytest-pythonpath ==0.7.4
- pytest-xdist ==2.5.0
- python-dateutil ==2.9.0.post0
- pytz ==2024.1
- rsa ==4.7.2
- s3transfer ==0.6.0
- sentry-sdk ==1.17.0
- setproctitle ==1.3.2
- six ==1.16.0
- smmap ==5.0.0
- tensorflow-addons ==0.14.0
- tensorflow-datasets ==4.5.2
- tensorflow-metadata ==1.12.0
- toml ==0.10.2
- tomli ==2.0.1
- tqdm ==4.65.0
- typeguard ==3.0.1
- typing_extensions ==4.5.0
- tzdata ==2024.1
- urllib3 ==1.26.15
- wandb ==0.14.0
- zipp ==3.15.0
- GitPython ==3.1.43
- Mako ==1.3.5
- MarkupSafe ==2.1.5
- PyTurboJPEG ==1.7.7
- PyYAML ==6.0.1
- attrs ==23.2.0
- awscli ==1.33.32
- botocore ==1.34.150
- certifi ==2024.7.4
- charset-normalizer ==3.3.2
- click ==8.1.7
- cloudpickle ==3.0.0
- colorama ==0.4.6
- cppimport ==22.8.2
- datasets ==2.1.0
- docutils ==0.16
- fastjsonschema ==2.20.0
- filelock ==3.15.4
- fsspec ==2024.6.1
- gitdb ==4.0.11
- horovod ==0.28.1
- huggingface-hub ==0.24.3
- idna ==2.7
- importlib-resources ==6.4.0
- jmespath ==1.0.1
- jsonschema ==4.23.0
- jsonschema-specifications ==2023.12.1
- jupyter-core ==5.7.2
- mypy-extensions ==1.0.0
- nbformat ==5.10.4
- numpy ==1.24.4
- packaging ==24.1
- pandas ==2.0.3
- pathtools ==0.1.2
- pkgutil-resolve-name ==1.3.10
- platformdirs ==4.2.2
- promise ==2.3
- protobuf ==3.20.
- psutil ==6.0.0
- pyasn1 ==0.6.0
- pybind11 ==2.13.1
- pytest ==6.2.5
- pytest-pythonpath ==0.7.3
- python-dateutil ==2.9.0
- referencing ==0.35.1
- regex ==2024.7.24
- requests ==2.32.3
- rpds-py ==0.19.1
- rsa ==4.7.2
- s3transfer ==0.10.2
- scipy ==1.5.4
- sentry-sdk ==2.11.0
- shortuuid ==1.0.13
- simple-parsing ==0.0.19
- six ==1.16.0
- smmap ==5.0.1
- tfrecord ==1.14.1
- timm ==1.0.15
- tokenizers ==0.12.1
- tqdm ==4.63.1
- traitlets ==5.14.3
- transformers ==4.26.1
- typing-extensions ==4.12.2
- typing-inspect ==0.9.0
- tzdata ==2024.1
- urllib3 ==2.2.2
- wandb ==0.12.8
- zipp ==3.19.2
- python-slugify ==8.0.4
- python-slugify ==8.0.4