Updated 6 months ago
scalene
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
Updated 5 months ago
https://github.com/alexkranias/triton_vs_cuda
Building Triton and CUDA kernels side-by-side to create a cuBLAS-performant GEMM kernel.
Updated 6 months ago
gpu_programming_beginner
Fundamentals of heterogeneous parallel programming with CUDA C/C++ at the beginner level.
Updated 6 months ago
kmm
KMM: parallel dataflow scheduler and efficient memory management for multi-GPU platforms