jubench-megatron-lm
JUPITER Benchmark Suite: Megatron-LM Benchmark
Science Score: 62.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
✓Institutional organization owner
Organization fzj-jsc has institutional domain (www.fz-juelich.de) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary
Repository
JUPITER Benchmark Suite: Megatron-LM Benchmark
Basic Info
- Host: GitHub
- Owner: FZJ-JSC
- License: mit
- Language: Shell
- Default Branch: main
- Size: 3.47 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
JUPITER Benchmark Suite: Megatron-LM
This benchmark is part of the JUPITER Benchmark Suite. See the repository of the suite for some general remarks.
This repository contains the Megatron-LM NLP/LLM benchmark. DESCRIPTION.md contains details for compilation, execution, and evaluation.
The required source code (Megatron-LM, Apex) is included in the ./src/ subdirectory as submodules from the upstream repositories; github.com/NVIDIA/Megatron-LM for Megatron-LM and github.com/NVIDIA/apex for Apex. Sample data files are also included.
Overview of Benchmark
Description Of Folder Structure
- benchmark
- aux
- tokenizers
- script used for getting data and tokenizers;
get_shrink_data_and_tokenizers.sh - script used for preprocessing data;
job_preprocess_data.sbatch - sample 10MB oscar dataset got using
get_shrink_data_and_tokenizers.sh
- env
- script for activating the python virtual env;
activate.bash - script to set up python virtual env;
setup_venv.sh
- script for activating the python virtual env;
- slurm
- sbatch scripts for 13B and 175B model to be used when running without JUBE
- jube
- contains accompanying files for JUBE run and the JUBE yaml file
- aux
- src
- data : contains the preprocessed data (
*idxand*.binfiles) compile_build.sh: script to build the software dependenciesvariables.bash: file that sets important pathsprebuild_kernels.py: script to prebuild fused kernels
- data : contains the preprocessed data (
Workflow Without JUBE:
Getting Data and Tokenizers
The following workflow can be done if data and tokenizers are not already present with this repository:
- Step 1: Set
NLP_BENCH_ROOTvariable asexport NLP_BENCH_ROOT=<rootdir path of this benchmark>in your bash shell - Step 2:
cd benchmark/aux/ - Step 3:
bash get_shrink_data_and_tokenizers.shto get tokenizers and compress the raw dataoscar-1GB.jsonl.xztooscar-10MB.jsonl.xz
Prepocessing Data
If your src/data folder does not contain preprocessed data (*.idx and *.bin files), then execute
sbatch job_preprocess_data.sbatch after Step 5 in "Workflow With Preprocessed Data And Tokenizers Available" from benchmark/aux directory.
The job_preprocess_data.sbatch script in benchmark/aux/ is used to preprocess the oscar-10MB.jsonl.xz and put it in src/data/. The file can be modified to preprocess any data of choice.
Workflow With Preprocessed Data And Tokenizers Available
- Step 1:
cdinto it the folder of this benchmark - Step 2: Set
NLP_BENCH_ROOTvariable asexport NLP_BENCH_ROOT=<rootdir path of this benchmark>in your bash shell - Step 3: Set
TORCH_CUDA_ARCH_LISTaccording to GPU's compute capability inbenchmark/env/activate.bash - Step 4: Run
bash benchmark/env/setup_venv.sh - Step 5: Run
bash src/compile_build.sh - Step 6: Run
sbatch benchmark/slurm/jobscript_13B.sbatchorsbatch benchmark/slurm/jobscript_175B.sbatch
The output file *.out file would have result logs of the following form that are important :
```
[default3]: iteration 10/ 292968 | consumed samples: 10240 | elapsed time per iteration (s): 35.8651 | learning rate: 4.734E-06 | global batch size: 1024 | lm loss: 1.332803E+01 | loss scale: 4096.0 | grad norm: 42.627 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 28.551 | TFLOPs: 199.03 |
[default3]: iteration 20/ 292968 | consumed samples: 20480 | elapsed time per iteration (s): 34.9991 | learning rate: 9.467E-06 | global batch size: 1024 | lm loss: 1.010884E+01 | loss scale: 4096.0 | grad norm: 13.038 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 29.258 | TFLOPs: 203.96 |
[default3]: iteration 30/ 292968 | consumed samples: 30720 | elapsed time per iteration (s): 34.8709 | learning rate: 1.420E-05 | global batch size: 1024 | lm loss: 9.072961E+00 | loss scale: 4096.0 | grad norm: 26.640 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 29.365 | TFLOPs: 204.71 |
[default3]: iteration 40/ 292968 | consumed samples: 40960 | elapsed time per iteration (s): 35.3346 | learning rate: 1.893E-05 | global batch size: 1024 | lm loss: 8.486469E+00 | loss scale: 4096.0 | grad norm: 3.441 | number of skipped iterations: 0 | number of nan iterations: 0 | samples per second: 28.980 | TFLOPs: 202.02 |
[default3]: iteration 50/ 292968 | consumed samples: 51200 | elapsed time per iteration (s): 35.3357 | learning rate: 2.367E-05 | global batch size: 1024 | lm loss: 8.
```
The metric tokens_per_sec should be calculated as (1.0/$elapsed_time_per_iteration)*$global_batch_size*$sequence_length obtained from the *.out file.
For submission the throughput tokenspersec is converted into time, a hypothetical training would require. This conversion is done by assuming a training with 20 Million tokens, using the formula
[ time_to_report_in_seconds ] = [tokens] / [tokens/second]
Example: Using the 13B model result below (Tokens/sec: 59463.14), we obtain a duration of 20,000,000 / 59463.14 = 336.34 seconds.
Hint: sequence_length can be found in the jobscript.
Workflow With JUBE:
- Step 1:
cdinto it the folder of this benchmark - Step 2: Set
TORCH_CUDA_ARCH_LISTaccording to GPU's compute capability inbenchmark/env/activate.bash - Step 3: Execute either
jube run benchmark/jube/nlp_benchmark.yaml --tag 175for 175B model orjube run benchmark/jube/nlp_benchmark.yaml --tag 13for 13B model - Step 4: Wait for the benchmark to run and then do
jube continue nlp_benchmark_run -i lastuntil no Steps with the "wait" state remain - Step 5: After the benchmark finishes, run
jube result -a nlp_benchmark_run -i lastto print the benchmark results
Example result from JUBE:
``` | system | version | queue | JobID | JobTime | ModelSize (Billion Param) | Nodes | BatchSize | PipelineParallel | TensorParallel | Iterations | AvgTFLOPs/GPU | Tokens/sec | timetoreportinseconds | |---------------|---------|---------|----------|------------|----------------------------|-------|------------|-------------------|-----------------|------------|----------------|------------|---------------------------| | juwelsbooster | 2024.01 | booster | 10011638 | "00:30:00" | 13 | 8 | 1024 | 4 | 2 | 20 | 206.885 | 60777.68 | 329.07 |
```
Owner
- Name: Jülich Supercomputing Centre
- Login: FZJ-JSC
- Kind: organization
- Location: Germany
- Website: https://www.fz-juelich.de/en/ias/jsc
- Twitter: fzj_jsc
- Repositories: 29
- Profile: https://github.com/FZJ-JSC
Jülich Supercomputing Centre provides HPC resources and expertise. Part of Forschungszentrum Jülich.
Citation (CITATION.cff)
cff-version: 1.2.0
title: "JUPITER Benchmark Suite: Megatron-LM"
message: >-
In addition to citing this benchmark repository, please also cite either the JUPITER Benchmark Suite or the accompanying SC24 paper
authors:
- given-names: Chelsea
family-names: John
affiliation: Forschungszentrum Jülich, Jülich Supercomputing Centre
orcid: 'https://orcid.org/0000-0003-3777-7393'
- given-names: Stefan
family-names: Kesselheim
affiliation: Forschungszentrum Jülich, Jülich Supercomputing Centre
orcid: 'https://orcid.org/0000-0003-0940-5752'
- given-names: Carolin
family-names: Penke
affiliation: Forschungszentrum Jülich, Jülich Supercomputing Centre
orcid: 'https://orcid.org/0000-0002-4043-3885'
- given-names: Jan
family-names: Ebert
affiliation: Forschungszentrum Jülich, Jülich Supercomputing Centre
orcid: 'https://orcid.org/0000-0001-7118-0481'
- given-names: Stepan
family-names: Nassyr
affiliation: Forschungszentrum Jülich, Jülich Supercomputing Centre
orcid: 'https://orcid.org/0000-0002-0035-244X'
- given-names: Andreas
family-names: Herten
affiliation: Forschungszentrum Jülich, Jülich Supercomputing Centre
orcid: 'https://orcid.org/0000-0002-7150-2505'
- given-names: Sebastian
family-names: Achilles
affiliation: Forschungszentrum Jülich, Jülich Supercomputing Centre
orcid: 'https://orcid.org/0000-0002-1943-6803'
abstract: "The Megatron-LM benchmark of the JUPITER Benchmark Suite"
identifiers:
- type: doi
value: 10.5281/zenodo.12788115
description: Version-agnostic Zenodo Identifier
repository-code: 'https://github.com/FZJ-JSC/jubench-megatron-lm/'
license: MIT
date-released: '2024-07-13'
references:
- title: "JUPITER Benchmark Suite"
type: software
doi: 10.5281/zenodo.12737073