https://github.com/anjiang-wei/ptx_dataset
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.6%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Anjiang-Wei
- Language: Python
- Default Branch: main
- Size: 1.91 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
PTX_dataset
Mirage
Example Equivalent CUDA code
Example Equivalent PTX code (all equivalent)
They are all equivalent, based on different schedules explored by superoptimization.
Generation method
First generate CUDA code based on the saved schedule for GQA kernel according to the AE doc
python3 $MIRAGE_ROOT/benchmark/group_query_attention.py --file $MIRAGE_ROOT/benchmark/saved_mugraphs/gqa_bs1.json
Then lower to PTX with this script
Cutlass
Example Equivalent Pair of PTX
Example Generated CUDA code
but some may be MM with transpose, need to take a look at the filenames
Example PTX code
Generation method
When building Cutlass profiler, a lot of template will be instantiated with different parameters. During runtime, Cutlass profiler can thus search for many equivalent versions to find the best configuration https://github.com/Anjiang-Wei/cutlassptx/blob/main/media/docs/cpp/profiler.md
```
mkdir build
cd build
cmake .. -DCUTLASSNVCCARCHS="80" -DCUTLASSLIBRARYKERNELS=gemm -DCUTLASSUNITYBUILDENABLED=ON
make cutlass_profiler -j
``
During compilation, the.cufiles are saved inbuild/tools/library/generated/gemm. Then I create a script to compile those.cu` files into PTX.
Usage:
cd build
./generate_ptx.py -j 20 --arch 80 -v
Triton
Example Pairs of equivalent PTX
To generate them, use auto-tuning from Triton:
python3 gated_mlp.py
python3 gqa.py
The helper function is triton_ptx_dump.py
TVM
Equivalent CUDA code
Even number pairs (different schedules, but same computation):
Inequivalent CUDA code
Odd number pairs (performing different computations):
Equivalent PTX code
Even number pairs (different schedules, but same computation):
Inequivalent PTX code
Odd number pairs (performing different computations):
Generation method
cd equibench
python3 download.py
python3 extract_pairs.py
python3 gen_ptx.py
Owner
- Login: Anjiang-Wei
- Kind: user
- Repositories: 19
- Profile: https://github.com/Anjiang-Wei
GitHub Events
Total
- Push event: 9
- Create event: 1
Last Year
- Push event: 9
- Create event: 1