gemmkernels.jl
Flexible and performant GEMM kernels in Julia
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, ieee.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Flexible and performant GEMM kernels in Julia
Basic Info
Statistics
- Stars: 83
- Watchers: 7
- Forks: 12
- Open Issues: 14
- Releases: 2
Topics
Metadata Files
README.md
GemmKernels
Flexible and performant GEMM kernels in Julia
This package contains a framework to instantiate flexible, performant GEMM (General Matrix Multiplication) kernels. You can use this framework to define your own GEMM kernels, or use one of the predefined interfaces that this package also provides.
Quick start
The package can be installed using Julia's built-in package manager.
Open the Julia REPL, type ] to enter Pkg-mode, and run:
julia
pkg> add GemmKernels
Most people will be interested in the BLAS-like interface that is available as
GemmKernels.mul!:
```julia julia> using GemmKernels, CUDA
julia> A = CUDA.rand(2048, 2048) julia> B = CUDA.rand(2048, 2048) julia> C = CUDA.zeros(2048, 2048)
julia> GemmKernels.mul!(C, A, B) ```
For more control, e.g. to use optimized layouts, or fuse the multiplication with a bias, you
need to use the low-level GemmKernels.matmul interface (see the examples directory).
Performance
The kernels in this package are expected to deliver around 50% to 80% of the performance of the state-of-the-art libraries like cuBLAS and CUTLASS. The exact performance depends on the specific invocation (e.g. the size of the matrices, the data type, etc.), and the GPU architecture.
For example, on an NVIDIA RTX 2080 Ti, we can achieve competitive performance for a mixed-precision multiplication of FP16 inputs and FP32 output:

Framework
The GEMM kernels above are implemented using a framework that decomposes GEMM kernels into orthogonal components:
- Params determine the tiling size and launch configuration of the GEMM kernel. The tiling sizes are specified in logical coordinates, i.e. with a meaning specified by the user.
- Layouts convert the logical coordinates of tiles to physical offsets in memory.
- Transforms are used to apply any arbitrary Julia functor to the GEMM's inputs or outputs. They are applied after every load, and before every store.
- Operators are responsible to perform the matrix multiplication itself. They load tiles from shared memory, perform the matrix multiplication, and store the resultant tile back to shared memory.
- Epilogues copy tiles of the resultant matrix to global memory, and can be used to implement arbitrary post-processing, such as adding a bias vector to the resultant matrix.
Each of these components corresponds to a set of functions with a predetermined interface. These functions can be customised by the user through Julia's multiple dispatch functionality.
The package currently provides two main operators, both of which for NVIDIA GPUs:
- WMMAOperator: for using Tensor cores through the WMMA APIs;
- FPUOperator: for other data types or input sizes.
Optimized layouts are available for diagonal matrices and matrices of complex/dual numbers.
Citation
For more details on the implementation and performance results, please see our accompanying
paper (pre-print available on arXiv). The
CITATION.bib file in the root of this repository contains a citation in
BibTeX format.
Owner
- Name: JuliaGPU
- Login: JuliaGPU
- Kind: organization
- Website: https://juliagpu.org/
- Repositories: 48
- Profile: https://github.com/JuliaGPU
GPU Computing in Julia
Citation (CITATION.bib)
@article{faingnaert2020flexible,
title={Flexible Performant {GEMM} Kernels on {GPUs}},
author={Faingnaert, Thomas and Besard, Tim and De Sutter, Bjorn},
year={2022},
journal={IEEE Transactions on Parallel and Distributed Systems},
volume={33},
number={9},
pages={2230-2248},
doi={10.1109/TPDS.2021.3136457},
}
GitHub Events
Total
- Watch event: 5
- Delete event: 1
- Push event: 7
- Pull request event: 2
- Fork event: 1
- Create event: 1
Last Year
- Watch event: 5
- Delete event: 1
- Push event: 7
- Pull request event: 2
- Fork event: 1
- Create event: 1
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Thomas Faingnaert | t****t@h****m | 77 |
| Tim Besard | t****d@g****m | 65 |
| github-actions[bot] | 4****] | 22 |
| Dilum Aluthge | d****m@a****m | 6 |
| dependabot[bot] | 4****] | 4 |
| wardvermeulen | 3****n | 3 |
| Simon Bil | s****l@g****m | 2 |
| Hendrik Ranocha | r****a | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 29
- Total pull requests: 169
- Average time to close issues: 6 months
- Average time to close pull requests: 16 days
- Total issue authors: 9
- Total pull request authors: 9
- Average comments per issue: 1.66
- Average comments per pull request: 1.48
- Merged pull requests: 145
- Bot issues: 0
- Bot pull requests: 36
Past Year
- Issues: 0
- Pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: 3 days
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.5
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 2
Top Authors
Issue Authors
- maleadt (14)
- DilumAluthge (6)
- ArrogantGao (2)
- thomasfaingnaert (2)
- GiggleLiu (1)
- kirshanthans (1)
- avik-pal (1)
- Wimmerer (1)
- JuliaTagBot (1)
Pull Request Authors
- thomasfaingnaert (85)
- maleadt (38)
- github-actions[bot] (35)
- DilumAluthge (7)
- wardvermeulen (5)
- dependabot[bot] (5)
- smnbl (3)
- GiggleLiu (1)
- ranocha (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- julia 2 total
- Total dependent packages: 1
- Total dependent repositories: 0
- Total versions: 2
juliahub.com: GemmKernels
Flexible and performant GEMM kernels in Julia
- Documentation: https://docs.juliahub.com/General/GemmKernels/stable/
- License: BSD-3-Clause
-
Latest release: 0.2.0
published almost 2 years ago
Rankings
Dependencies
- actions/checkout v4 composite
- julia-actions/setup-julia v1 composite
- JuliaRegistries/TagBot v1 composite