https://github.com/anjiang-wei/ptx_dataset

https://github.com/anjiang-wei/ptx_dataset

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.6%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: Anjiang-Wei
  • Language: Python
  • Default Branch: main
  • Size: 1.91 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 10 months ago · Last pushed 10 months ago
Metadata Files
Readme

README.md

PTX_dataset

Mirage

Example Equivalent CUDA code

CUDA folder

Example Equivalent PTX code (all equivalent)

PTX folder

They are all equivalent, based on different schedules explored by superoptimization.

Generation method

First generate CUDA code based on the saved schedule for GQA kernel according to the AE doc python3 $MIRAGE_ROOT/benchmark/group_query_attention.py --file $MIRAGE_ROOT/benchmark/saved_mugraphs/gqa_bs1.json

Then lower to PTX with this script

Cutlass

Example Equivalent Pair of PTX

GEMM 1 GEMM 2

Example Generated CUDA code

CUDA Folder

but some may be MM with transpose, need to take a look at the filenames

Example PTX code

PTX Folder

Generation method

When building Cutlass profiler, a lot of template will be instantiated with different parameters. During runtime, Cutlass profiler can thus search for many equivalent versions to find the best configuration https://github.com/Anjiang-Wei/cutlassptx/blob/main/media/docs/cpp/profiler.md ``` mkdir build cd build cmake .. -DCUTLASSNVCCARCHS="80" -DCUTLASSLIBRARYKERNELS=gemm -DCUTLASSUNITYBUILDENABLED=ON make cutlass_profiler -j `` During compilation, the.cufiles are saved inbuild/tools/library/generated/gemm. Then I create a script to compile those.cu` files into PTX.

Usage: cd build ./generate_ptx.py -j 20 --arch 80 -v

Triton

Example Pairs of equivalent PTX

matmul0 matmul1 (All matmuls)

gqa0 gqa1 (All GQAs)

To generate them, use auto-tuning from Triton: python3 gated_mlp.py python3 gqa.py

The helper function is triton_ptx_dump.py

TVM

Equivalent CUDA code

Even number pairs (different schedules, but same computation):

Pair 0

Pair 2

Pair 4

Inequivalent CUDA code

Odd number pairs (performing different computations):

Pair 1

Pair 3

Pair 5

Equivalent PTX code

Even number pairs (different schedules, but same computation):

Pair 0

Pair 2

Pair 4

Inequivalent PTX code

Odd number pairs (performing different computations):

Pair 1

Pair 3

Pair 5

Generation method

cd equibench python3 download.py python3 extract_pairs.py python3 gen_ptx.py

Owner

  • Login: Anjiang-Wei
  • Kind: user

GitHub Events

Total
  • Push event: 9
  • Create event: 1
Last Year
  • Push event: 9
  • Create event: 1