enzyme-gpu-tests

This repo contains the benchmarks for Enzyme on GPU's

https://github.com/wsmoses/enzyme-gpu-tests

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    2 of 5 committers (40.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

This repo contains the benchmarks for Enzyme on GPU's

Basic Info
  • Host: GitHub
  • Owner: wsmoses
  • Language: LLVM
  • Default Branch: main
  • Homepage:
  • Size: 41.7 MB
Statistics
  • Stars: 11
  • Watchers: 3
  • Forks: 3
  • Open Issues: 2
  • Releases: 1
Created almost 5 years ago · Last pushed 7 months ago
Metadata Files
Readme Citation

README.md

Enzyme-GPU-Tests

DOI

This repo contains the benchmarks for Enzyme on GPU's.

If Enzyme, or part of this repository is useful to you, please cite: @inproceedings{enzymeGPU, title={Reverse-Mode Automatic Differentiation and Optimization of GPU Kernels via Enzyme}, author={Moses, William S and Churavy, Valentin and Paehler, Ludger and H{\"u}ckelheim, Jan and Hari Krishna Narayanan, Sri and Schanen, Michel and Doerfert, Johannes}, booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, year={2021}, location = {St. Louis, Missouri}, series = {SC '21} }

Below we describe how to run the 6 benchmarks presented here: XSBench (CUDA), RSBench (CUDA), Parboil LBM (CUDA), LULESH (CUDA), DG (CUDA), DG (AMD). Of these benchmarks, the AMD and CUDA DG codes are Julia-based where as the remainder are C++/CUDA. Evaluation of these benchmarks allowed the paper to demonstrate the efficiency of the generated gradients in comparison to the original code, the impact of the novel optimizations, the scalability of the generated gradients, and the correctness of the tool.

To run the benchmarks used in the paper, we first need to build the LLVM compiler toolchain before we can subsequently link the compiler plugin of Enzyme against our built LLVM version. To install LLVM, please follow the following steps:

``` $ cd ~ $ git clone https://github.com/llvm/llvm-project $ cd llvm-project

The following git hash was used in our paper

You may use this, or a newer version

$ git checkout 8dab25954b0acb53731c4aa73e9a7f4f98263030 $ mkdir build && cd build $ cmake ../llvm -DLLVMENABLEPROJECTS="clang" -DLLVMTARGETSTOBUILD="X86;NVPTX" -DCMAKEBUILD_TYPE=Release -G Ninja $ ninja

This may take a while

clang is now be available in ~/llvm-project/build/bin/clang

``` We now must build an Enzyme based off of our chosen LLVM version.

``` $ cd ~ $ git clone https://github.com/wsmoses/Enzyme

The following git hash was used in our paper

You may use this, or a newer version

$ git checkout ec75831a8cb0170090c366f8da6e3b2b87a20f6e $ cd Enzyme/enzyme $ mkdir build && cd build $ cmake ../enzyme -DLLVMDIR=/path/to/llvm/build -DCMAKEBUILD_TYPE=Release -G Ninja $ ninja

ClangEnzyme-13.so will now be available in ~/Enzyme/enzyme/build/Enzyme/ClangEnzyme-13.so

```

Some of the C++ benchmarks require a custom CUDA libdevice (the implementation of various CUDA intrinsics). This is to remedy an issue within LLVM that prevents common math functions from being identified as LLVM intrinsics (this is being worked on in upstream LLVM). For the default CUDA installation, libdevice can be found at /usr/local/cuda/nvvm/libdevice.10.bc. The following code snippet describes how to replace a libdevice file, assuming you are using CUDA 11.2. The instructions are similar for a different CUDA installation, with the path changed accordingly. Note that you may need to be root to perform the change, and that you should always make a backup of your previous libdevice.

```

Save a copy of your current libdevice

$ sudo cp /usr/local/cuda-11.2/nvvm/libdevice/libdevice.10.bc /usr/local/cuda-11.2/nvvm/libdevice/libdevice.10.bc.old $ sudo cp /path/to/new/libdevice.10.bc /usr/local/cuda-11.2/nvvm/libdevice/libdevice.10.bc ```

We have created Python3 scripts to ease setting up and running our experiments. They will attempt to deduce appropriate paths given the following environmental variables. In the event that there is an issue, you likely may need to change bench.py or the Makefile for the test as appropriate. All of the bench.py benchmarking scripts follow the same structure.

```

The index of the CUDA GPU desired

$ export DEVICE=1

The path to the CUDA installation

$ export CUDA_PATH=/usr/local/cuda-11.2

The path to the Clang++ binary we built above

$ export CLANG_PATH=/path/to/llvm/build/bin/clang++

The path to the Enzyme plugin we built above

$ export ENZYME_PATH=/path/to/Enzyme/enzyme/build/Enzyme/ClangEnzyme-13.so ```

We can now work with the benchmark suite (this repository).

$ git clone https://github.com/wsmoses/Enzyme-GPU-Tests

The benchmark suite folder breaks down into the following structure: * DG, a Discontinuous Galerkin benchmark (CUDA & ROCm in Julia) * LBM, a Lattice Boltzmann benchmark (CUDA in C++) * LULESH, a Lagrangian Hydrodynamics benchmark (CUDA in C++) * RSBench, a Monte Carlo Particle Transport benchmark (CUDA in C++) * XSBench, a Monte Carlo Particle Transport benchmark (CUDA in C++)

We can now enter one of the 4 C++ test directories (XSBench, RSBench, LBM, LULESH) and run the corresponding benchmark.

``` $ cd Enzyme-GPU-Tests/LBM $ python3 bench.py

output of benchmark times printed out here

```

The bench.py script will run first an ablation analysis that enables or disables differentiation, along with several optimizations. The result of these tests will be the execution time of the gradient and/or original kernel. The script will then run by scaling tests for both the gradient and original kernel by evaluating on increasing problem sizes. Some benchmarks (XSBench, LBM, LULESH, DG (CUDA)) will end by printing out the derivative as computed by both numeric differentiation and Enzyme. These will include VERIFY=yes as part of the run line. All other run lines will contain the execution time of that benchmark. Be aware that LULESH's ablation analysis includes benchmarks configurations, which do not perform compiler optimizations and are hence significantly slower than the other benchmarks.

XSBench and RSBench require the libdevice found in Enzyme-GPU-Tests/libdevice1.

LBM uses the packaged libdevice from NVIDIA.

LULESH uses the libdevice found in Enzyme-GPU-Tests/libdevice2. The LULESH benchmark suite furthermore relies on NVIDIA's N-Sight compute utility (NCU). NCU has known issues with access to the GPU Performance counters, which you will need to benchmark the gradient kernel of LULESH. If you should run into this issue please have a look at the following documentation to remedy it.

Odd performance results or a compiler error is a potential indicator of using an incorrect libdevice.

For example, when compiling one of the ablation tests of RSBench without the correct libdevice, one may see the following error when running bench.py: ``` cannot handle (augmented) unknown intrinsic %5 = tail call i32 @llvm.nvvm.d2i.hi(double %0)

21 fatal error: error in backend: (augmented) unknown intrinsic

clang-13: error ```

The two DG tests were run using Julia 1.6. Julia at this version must be found in your path before being able to run the Julia tests. To obtain a working Julia installation see here and follow the provided installation instructions.

DG (CUDA) was run with the libdevice found in Enzyme-GPU-Tests/libdevice1.

We have provided a similar bench.py script for DG. While printed in a different format (CSV-style), it contains the same information about runtimes for both ablation and scaling as the C++ CUDA tests (DG AMD does not have an ablation analysis as it does not run without all optimizations applied).

$ cd Enzyme-GPU-Tests/DG/cuda $ python3 bench.py

Note that the numeric verification may come earlier in the script's output, and should look something like this: ```

Enzyme derivative as the first element of tuple

followed by numeric approximation on the right

(dQ.dval[1], (o2 - o1) / 0.0001) = (-1.105959f0, [-1.10626220703125]) ```

The forward pass alone in the CSV-style output are denoted as the "primal" rows, whereas the derivative runtimes are marked as "all_dub".

The DG tests may require additional setup. For example, you see an output like below (note that this may also occur if you try to run DG (AMD) on a system without the relevant AMD libraries available). Warning: HSA runtime has not been built, runtime functionality will be unavailable. Please run Pkg.build("AMDGPU") and reload AMDGPU.

You may then need to explicitly run various setup routines within Julia's package manager. To fix the Julia setup for the test, perform the following to enter an interactive shell.

$ cd Enzyme-GPU-Tests/DG/rocm $ julia --project=. julia> using Pkg; Pkg.build("AMDGPU")

Owner

  • Name: William Moses
  • Login: wsmoses
  • Kind: user
  • Location: Cambridge, MA
  • Company: MIT

Citation (CITATION.cff)

cff-version: 1.2.0
message: If you use this software, please cite it using the following metadata.
title: Reverse-Mode Automatic Differentiation and Optimization of GPU Kernels via Enzyme
authors: 
- family-names: Moses
  given-names: William S.
  affiliation: MIT
  orcid: https://orcid.org/0000-0003-2627-0642
- family-names: Churavy
  given-names: Valentin
  affiliation: MIT
  orcid: https://orcid.org/0000-0002-9033-165X
- family-names: Paehler
  given-names: Ludger
- family-names: Hueckelheim
  given-names: Jan
- family-names: "Hari Krishna Narayanan"
  given-names: Sri
- family-names: Schanen
  given-names: Michel
- family-names: Doerfert
  given-names: Johannes
keywords:
- Enzyme
- GPU
- CUDA
- ROCm
- C++
- Julia
- Automatic Differentiation
version: 0.1.0

GitHub Events

Total
  • Watch event: 3
  • Issue comment event: 1
  • Push event: 3
  • Pull request review comment event: 1
  • Pull request event: 2
  • Fork event: 1
  • Create event: 2
Last Year
  • Watch event: 3
  • Issue comment event: 1
  • Push event: 3
  • Pull request review comment event: 1
  • Pull request event: 2
  • Fork event: 1
  • Create event: 2

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 36
  • Total Committers: 5
  • Avg Commits per committer: 7.2
  • Development Distribution Score (DDS): 0.611
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
William S. Moses gh@w****m 14
Valentin Churavy v****y@g****m 9
Ludger Paehler l****r@t****e 7
William Moses w****s@c****o 4
Valentin Churavy v****y@m****u 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • wsmoses (2)
Top Labels
Issue Labels
Pull Request Labels