aitemplate-nm-pruning
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Ivorchu
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 20.5 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Sparse GEMM Profiler with CUTLASS and AITemplate
This project implements a custom 2:4 structured sparse GEMM (General Matrix Multiplication) profiler using NVIDIA's CUTLASS library and Meta's AITemplate compiler framework. The profiler builds and benchmarks sparse tensor core GEMM kernels on Ampere GPUs, integrating metadata preprocessing and kernel configuration into AITemplate’s workflow.
📌 Key Features
- Structured 2:4 sparsity with Tensor Core acceleration
- Uses
cutlass::gemm::device::SparseGemmfor sparse kernel execution - Metadata reordering using
cutlass::reorder_meta - Automatic profiler generation and kernel selection with AITemplate
- Split-K support for parallelism along the reduction (K) dimension
- Tensor inspection/debug tools via CUDA host/device memory copy
📁 Project Structure
AITemplate/
├── 3rdparty/
│ └── cutlass/ # CUTLASS 2.x source
├── examples/sparse_test/
│ ├── sparse_test.py # Entry point for benchmarking
│ └── [generated profiler .cu files]
├── python/aitemplate/
│ └── compiler/transform/profile/ # Profile logic and hooks
🚀 Getting Started
1. Build Docker Image
bash
cd docker
./build.sh cuda
2. Launch the Container
bash
./run.sh
3. Run Sparse Profiler
bash
cd examples/sparse_test
python sparse_test.py
This will generate and run sparse GEMM profiler binaries with specific shape and split-K configs.
🧠 Technical Details
CUTLASS Sparse GEMM
CUTLASS provides SparseGemm, a class for structured sparse matrix multiplication on NVIDIA Ampere GPUs. It requires:
- Operand A: dense activation
- Operand B: 2:4 sparse weights (split into values and metadata)
- Operand E: metadata tensor in CUTLASS-native format (reordered)
Metadata Reordering
cpp
cutlass::TensorRef<ElementE, cutlass::layout::RowMajor> meta_src((ElementE*)m_ptr, meta_stride);
cutlass::TensorRef<ElementE, cutlass::layout::RowMajor> meta_dst((ElementE*)m_ptr, meta_stride);
cutlass::gemm::GemmCoord meta_extent(M, N, K / 2 / kElementsPerElementE);
cutlass::reorder_meta(meta_dst, meta_src, meta_extent);
Kernel Arguments Setup
cpp
Gemm::Arguments arguments{
cutlass::gemm::GemmCoord{M, N, K},
{a_ptr, a_stride},
{b_ptr, b_stride},
{c_ptr, c_stride},
{d_ptr, d_stride},
{meta_ptr, meta_stride},
{alpha, beta},
split_k_slices
};
Tensor Debugging
cpp
std::vector<cutlass::half_t> host_B(b_size);
cudaMemcpy(host_B.data(), b_ptr, b_size * sizeof(cutlass::half_t), cudaMemcpyDeviceToHost);
for (int i = 0; i < b_size; ++i) {
std::cout << __half2float(static_cast<__half>(host_B[i])) << " ";
}
📈 Profiling Output
AITemplate generates profiler binaries and stores them in:
/tmp/aitemplate_cache/<hash>/
The profiler selection uses cached results unless force_profile() is set to True.
🛠 Common Issues
| Issue | Fix |
|------------------------------------------|----------------------------------------------------------------------|
| cudaFuncSetAttribute failed | Ensure shared memory usage < 48 KB or compile with -maxrregcount |
| Illegal memory access in profiling | Check metadata pointer and metastride validity |
| `cutlass::halfttohalfconversion | Use.raw()or explicit cast tohalf` |
| Invalid tensor layout | Make sure B is column-major if using RCR kernel |
🙋 Author
This project is developed by Ivor, a Computer Engineering student at Purdue University, as part of research on structured sparsity, AI acceleration, and compiler-accelerated inference.
📄 License
This project is for research and educational purposes. CUTLASS and AITemplate are licensed under their respective open-source licenses.
Owner
- Name: Ivor Chu
- Login: Ivorchu
- Kind: user
- Company: IvorChu
- Repositories: 15
- Profile: https://github.com/Ivorchu
GitHub Events
Total
- Push event: 6
- Create event: 1
Last Year
- Push event: 6
- Create event: 1
Dependencies
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v3 composite
- actions/configure-pages v2 composite
- actions/deploy-pages v1 composite
- actions/setup-python v2 composite
- actions/upload-pages-artifact v1 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v2 composite
- ROCmSoftwarePlatform * development
- danmar * development
- torch *
- jinja2 *