Recent Releases of gemmkernels.jl
gemmkernels.jl - v0.2.0
GemmKernels v0.2.0
Merged pull requests: - Use native Float16 (#69) (@maleadt) - Parallelized testing using XUnit.jl. (#71) (@maleadt) - CompatHelper: bump compat for "CUDA" to "3.0" (#76) (@github-actions[bot]) - Fix layout fragment type mismatch (#80) (@smnbl) - Fix CI (#82) (@maleadt) - Replace StaticArrays with a simple immutable array type (#83) (@maleadt) - update CUDA compat (#87) (@smnbl) - Update README (#88) (@thomasfaingnaert) - Update operator fusion benchmarks (#89) (@thomasfaingnaert) - Cleanup kernel launch code (#90) (@thomasfaingnaert) - Revert "Replace StaticArrays with a simple immutable array type (#83)" (#91) (@thomasfaingnaert) - Disable codecov status (#92) (@thomasfaingnaert) - Generalise WMMA Operator (#93) (@thomasfaingnaert) - Add tensor contraction benchmark (#94) (@thomasfaingnaert) - Replace GPUifyLoops with KernelAbstractions (#95) (@thomasfaingnaert) - CompatHelper: bump compat for KernelAbstractions to 0.8, (keep existing compat) (#96) (@github-actions[bot]) - Re-land StaticArrays removal (#98) (@maleadt) - FPU operator (#101) (@wardvermeulen) - Add CI for Julia 1.9 (#102) (@thomasfaingnaert) - Bump compat bounds to use newer CUDA.jl (#103) (@maleadt) - CompatHelper: bump compat for LLVM to 6, (keep existing compat) (#106) (@github-actions[bot]) - Replace KernelAbstractions with LLVMLoopInfo. (#107) (@maleadt) - Make LocalArray setindex convert. (#109) (@maleadt) - Make vectorized store convert and perform multiple stores if required (#111) (@maleadt) - Configure and check shared memory automatically. (#112) (@maleadt) - Enable use of FPU operator in BLAS wrappers. (#113) (@maleadt) - Add a benchmarks bot. (#116) (@maleadt) - Commit the Manifest. (#118) (@maleadt) - Introduce a helper macro to simplify immutable indexing. (#119) (@maleadt) - Add zero layout to optimize alpha/beta=zero. (#120) (@maleadt) - Use XUnit.jl for parallel testing. (#121) (@maleadt) - Unify WMMA and FPU operator typevars NFC (@maleadt) - Transform VecElement-contained values. (#123) (@maleadt) - Simplify tests. (#124) (@maleadt) - Update manifest (#126) (@github-actions[bot]) - Fix vector op indexing and add boundscheck. (#127) (@maleadt) - BLAS: Convert alpha & beta to more appropriate types. (#129) (@maleadt) - Add layouts for accessing unaligned or non tile-sized global. (#130) (@maleadt) - Fix fragtypes of ColMajor and RowMajor fallback layouts. (#131) (@maleadt) - Put the BLAS interface directly in the GemmKernels.jl module. (#132) (@maleadt) - Add example. (#133) (@maleadt) - Detect alignment issues and throw a Julia error. (#134) (@maleadt) - Check if the warp doesn't index out of the tile subpartition. (#135) (@maleadt) - Simplify config definition and usage. (#136) (@maleadt) - Add a mechanism to expose execution details to callers. (#137) (@maleadt) - Show kernel details on benchmark differences. (#138) (@maleadt) - Update manifest (#139) (@github-actions[bot]) - Update manifest (#141) (@github-actions[bot]) - Update manifest (#144) (@github-actions[bot]) - Update manifest (#145) (@github-actions[bot]) - Update manifest (#146) (@github-actions[bot]) - Update manifest (#147) (@github-actions[bot]) - enable dependabot for GitHub actions (#148) (@ranocha) - Bump peter-evans/create-pull-request from 3 to 5 (#149) (@dependabot[bot]) - Bump actions/checkout from 2 to 3 (#150) (@dependabot[bot]) - Update manifest (#151) (@github-actions[bot]) - Update manifest (#153) (@github-actions[bot]) - CompatHelper: bump compat for "CUDA" to "5" (#155) (@github-actions[bot]) - Update manifest (#156) (@github-actions[bot]) - Bump actions/checkout from 3 to 4 (#157) (@dependabot[bot]) - Update manifest (#158) (@github-actions[bot]) - Rework benchmarks and tests (#160) (@thomasfaingnaert) - Add more flexible FPU operator (#161) (@wardvermeulen) - Update manifest (#162) (@github-actions[bot]) - Update manifest (#163) (@github-actions[bot]) - Fix configuration heuristic. (#164) (@maleadt) - Throw ConfigError for unsupported WMMA shapes (#166) (@thomasfaingnaert) - Add a check for the block shape in the K dimension (#167) (@wardvermeulen) - Adapt to CUDA.jl profile changes (#168) (@thomasfaingnaert) - Compare with cuBLAS during benchmarking (#169) (@thomasfaingnaert) - Refactor configs to use macros (#170) (@thomasfaingnaert) - Test more WMMA configurations (#171) (@thomasfaingnaert) - Improve heuristic for memcopy tile sizes (#172) (@thomasfaingnaert) - Check number of stages for pipelined kernel (#173) (@thomasfaingnaert) - Check number of threads before launching kernel (#174) (@thomasfaingnaert) - Fix alignment check for non 16-byte alignments (#175) (@thomasfaingnaert) - Do not hardcode vectorisation width in layouts (#176) (@thomasfaingnaert) - Fix typo in parallelise function name (#178) (@thomasfaingnaert) - Add script to tune parameters (#179) (@thomasfaingnaert) - Check tile sizes in config (#180) (@thomasfaingnaert) - FPUOp: Ensure the FMA operator is inlined. (#182) (@maleadt) - Extend set of WMMA operator shapes (#183) (@thomasfaingnaert) - Apply isapprox elementwise (#185) (@thomasfaingnaert) - Get benchmarks working again (#186) (@thomasfaingnaert) - Remove Julia 1.8 from CI (#187) (@thomasfaingnaert) - Refactor tuning script (#190) (@maleadt) - Bump julia-actions/setup-julia from 1 to 2 (#191) (@dependabot[bot])
Closed issues:
- Errors on small array inputs (#52)
- Feature request: support for matmul with integer matrices (#64)
- Feature request: support Matrix{Float32} = Matrix{Float32} × Matrix{Float32} (#75)
- Remove fragtype_a (#84)
- Replace GPUifyLoops.@unroll (#86)
- Use LLVMLoopInfo.jl (#104)
- Optimizations when alpha or beta is 0 (#110)
- Transform functions: pass values, not VecElements (#114)
- Benchmark bot (#115)
- Questions about usage of registers (#152)
- A wrong function name parallellise (#177)
- Julia
Published by github-actions[bot] almost 2 years ago
gemmkernels.jl - v0.1.0
GemmKernels v0.1.0
Closed issues:
- Unable to add the Package (#48)
- Migrate GPU CI from GitLab to Buildkite? (#54)
- Submit code coverage information to Codecov (#55)
- Warning: Performing scalar operations on GPU arrays when running the test suite (#62)
Merged pull requests:
- Add Tiling API (#1) (@thomasfaingnaert)
- Add matmul API (#2) (@thomasfaingnaert)
- Add compat entries (#8) (@thomasfaingnaert)
- Add benchmarks for WMMA (#9) (@thomasfaingnaert)
- Rename Layout.size() to Layout.physicalsize() (#10) (@thomasfaingnaert)
- Add row major layout (#11) (@thomasfaingnaert)
- Add scaling to WMMA benchmarks (#12) (@thomasfaingnaert)
- Add instructions for benchmarking (#13) (@thomasfaingnaert)
- Use correct layouts in benchmarks (#14) (@thomasfaingnaert)
- Change iteration order for transposed matrices (#15) (@thomasfaingnaert)
- Fix padding for transposed layouts (#17) (@thomasfaingnaert)
- Fix label mixed-precision GEMM tests (#18) (@thomasfaingnaert)
- Add test to conjugate complex matrix (#19) (@thomasfaingnaert)
- Support row major layouts for complex matrix multiplication (#20) (@thomasfaingnaert)
- Add benchmarks for complex GEMM (#21) (@thomasfaingnaert)
- Fix padded layouts for complex GEMM (#22) (@thomasfaingnaert)
- Add epilogue to add a bias vector to the matrix output (#23) (@thomasfaingnaert)
- Specialise GEMM kernel for diagonal matrices (#24) (@thomasfaingnaert)
- Add BLAS API (#25) (@thomasfaingnaert)
- Extend BLAS interface to Diagonal matrices (#26) (@thomasfaingnaert)
- Add workaround for FP16 multiplication (#27) (@thomasfaingnaert)
- Use BLAS interface for WMMA benchmark (#28) (@thomasfaingnaert)
- Fuse multiply and divide of C fragment (#29) (@thomasfaingnaert)
- Adapt to updated CuArray type (#30) (@thomasfaingnaert)
- Fix default value of 'computewarp' parameter (#31) (@thomasfaingnaert)
- Clean up Tiling API (#32) (@thomasfaingnaert)
- Update manifest (#33) (@github-actions[bot])
- Adapt tiling tests to new translate name (#34) (@thomasfaingnaert)
- Update dependencies (#35) (@thomasfaingnaert)
- Allow using binary build in profile script (#36) (@thomasfaingnaert)
- Add all benchmarks (#37) (@thomasfaingnaert)
- Software pipelining (#38) (@thomasfaingnaert)
- Testsuite changes (#39) (@thomasfaingnaert)
- Expand testsuite (#40) (@thomasfaingnaert)
- Add CI via GitLab (#41) (@thomasfaingnaert)
- Add commit filter for CI (#43) (@thomasfaingnaert)
- Fix commit filter (#44) (@thomasfaingnaert)
- Expand README (#45) (@thomasfaingnaert)
- Update dependencies (#49) (@thomasfaingnaert)
- Test larger complex and dual matrices (#50) (@thomasfaingnaert)
- Move to BuildKite. (#56) (@maleadt)
- README: Buildkite badge should show status of the master branch; Codecov badge does not require token (#58) (@DilumAluthge)
- Run Buildkite on both Julia 1.5 and Julia 1.6-nightly (#60) (@DilumAluthge)
- Print some version information at the beginning of the test suite (#61) (@DilumAluthge)
- Set CUDA.allowscalar(false) at the beginning of the test suite (#63) (@DilumAluthge)
- Avoid scalar iteration in testsuite (#65) (@thomasfaingnaert)
- Run Buildkite on Julia 1.5, 1.6-nightly, and nightly (#66) (@DilumAluthge)
- Julia
Published by github-actions[bot] about 5 years ago