Recent Releases of gemmkernels.jl

gemmkernels.jl - v0.2.0

GemmKernels v0.2.0

Diff since v0.1.0

Merged pull requests: - Use native Float16 (#69) (@maleadt) - Parallelized testing using XUnit.jl. (#71) (@maleadt) - CompatHelper: bump compat for "CUDA" to "3.0" (#76) (@github-actions[bot]) - Fix layout fragment type mismatch (#80) (@smnbl) - Fix CI (#82) (@maleadt) - Replace StaticArrays with a simple immutable array type (#83) (@maleadt) - update CUDA compat (#87) (@smnbl) - Update README (#88) (@thomasfaingnaert) - Update operator fusion benchmarks (#89) (@thomasfaingnaert) - Cleanup kernel launch code (#90) (@thomasfaingnaert) - Revert "Replace StaticArrays with a simple immutable array type (#83)" (#91) (@thomasfaingnaert) - Disable codecov status (#92) (@thomasfaingnaert) - Generalise WMMA Operator (#93) (@thomasfaingnaert) - Add tensor contraction benchmark (#94) (@thomasfaingnaert) - Replace GPUifyLoops with KernelAbstractions (#95) (@thomasfaingnaert) - CompatHelper: bump compat for KernelAbstractions to 0.8, (keep existing compat) (#96) (@github-actions[bot]) - Re-land StaticArrays removal (#98) (@maleadt) - FPU operator (#101) (@wardvermeulen) - Add CI for Julia 1.9 (#102) (@thomasfaingnaert) - Bump compat bounds to use newer CUDA.jl (#103) (@maleadt) - CompatHelper: bump compat for LLVM to 6, (keep existing compat) (#106) (@github-actions[bot]) - Replace KernelAbstractions with LLVMLoopInfo. (#107) (@maleadt) - Make LocalArray setindex convert. (#109) (@maleadt) - Make vectorized store convert and perform multiple stores if required (#111) (@maleadt) - Configure and check shared memory automatically. (#112) (@maleadt) - Enable use of FPU operator in BLAS wrappers. (#113) (@maleadt) - Add a benchmarks bot. (#116) (@maleadt) - Commit the Manifest. (#118) (@maleadt) - Introduce a helper macro to simplify immutable indexing. (#119) (@maleadt) - Add zero layout to optimize alpha/beta=zero. (#120) (@maleadt) - Use XUnit.jl for parallel testing. (#121) (@maleadt) - Unify WMMA and FPU operator typevars NFC (@maleadt) - Transform VecElement-contained values. (#123) (@maleadt) - Simplify tests. (#124) (@maleadt) - Update manifest (#126) (@github-actions[bot]) - Fix vector op indexing and add boundscheck. (#127) (@maleadt) - BLAS: Convert alpha & beta to more appropriate types. (#129) (@maleadt) - Add layouts for accessing unaligned or non tile-sized global. (#130) (@maleadt) - Fix fragtypes of ColMajor and RowMajor fallback layouts. (#131) (@maleadt) - Put the BLAS interface directly in the GemmKernels.jl module. (#132) (@maleadt) - Add example. (#133) (@maleadt) - Detect alignment issues and throw a Julia error. (#134) (@maleadt) - Check if the warp doesn't index out of the tile subpartition. (#135) (@maleadt) - Simplify config definition and usage. (#136) (@maleadt) - Add a mechanism to expose execution details to callers. (#137) (@maleadt) - Show kernel details on benchmark differences. (#138) (@maleadt) - Update manifest (#139) (@github-actions[bot]) - Update manifest (#141) (@github-actions[bot]) - Update manifest (#144) (@github-actions[bot]) - Update manifest (#145) (@github-actions[bot]) - Update manifest (#146) (@github-actions[bot]) - Update manifest (#147) (@github-actions[bot]) - enable dependabot for GitHub actions (#148) (@ranocha) - Bump peter-evans/create-pull-request from 3 to 5 (#149) (@dependabot[bot]) - Bump actions/checkout from 2 to 3 (#150) (@dependabot[bot]) - Update manifest (#151) (@github-actions[bot]) - Update manifest (#153) (@github-actions[bot]) - CompatHelper: bump compat for "CUDA" to "5" (#155) (@github-actions[bot]) - Update manifest (#156) (@github-actions[bot]) - Bump actions/checkout from 3 to 4 (#157) (@dependabot[bot]) - Update manifest (#158) (@github-actions[bot]) - Rework benchmarks and tests (#160) (@thomasfaingnaert) - Add more flexible FPU operator (#161) (@wardvermeulen) - Update manifest (#162) (@github-actions[bot]) - Update manifest (#163) (@github-actions[bot]) - Fix configuration heuristic. (#164) (@maleadt) - Throw ConfigError for unsupported WMMA shapes (#166) (@thomasfaingnaert) - Add a check for the block shape in the K dimension (#167) (@wardvermeulen) - Adapt to CUDA.jl profile changes (#168) (@thomasfaingnaert) - Compare with cuBLAS during benchmarking (#169) (@thomasfaingnaert) - Refactor configs to use macros (#170) (@thomasfaingnaert) - Test more WMMA configurations (#171) (@thomasfaingnaert) - Improve heuristic for memcopy tile sizes (#172) (@thomasfaingnaert) - Check number of stages for pipelined kernel (#173) (@thomasfaingnaert) - Check number of threads before launching kernel (#174) (@thomasfaingnaert) - Fix alignment check for non 16-byte alignments (#175) (@thomasfaingnaert) - Do not hardcode vectorisation width in layouts (#176) (@thomasfaingnaert) - Fix typo in parallelise function name (#178) (@thomasfaingnaert) - Add script to tune parameters (#179) (@thomasfaingnaert) - Check tile sizes in config (#180) (@thomasfaingnaert) - FPUOp: Ensure the FMA operator is inlined. (#182) (@maleadt) - Extend set of WMMA operator shapes (#183) (@thomasfaingnaert) - Apply isapprox elementwise (#185) (@thomasfaingnaert) - Get benchmarks working again (#186) (@thomasfaingnaert) - Remove Julia 1.8 from CI (#187) (@thomasfaingnaert) - Refactor tuning script (#190) (@maleadt) - Bump julia-actions/setup-julia from 1 to 2 (#191) (@dependabot[bot])

Closed issues: - Errors on small array inputs (#52) - Feature request: support for matmul with integer matrices (#64) - Feature request: support Matrix{Float32} = Matrix{Float32} × Matrix{Float32} (#75) - Remove fragtype_a (#84) - Replace GPUifyLoops.@unroll (#86) - Use LLVMLoopInfo.jl (#104) - Optimizations when alpha or beta is 0 (#110) - Transform functions: pass values, not VecElements (#114) - Benchmark bot (#115) - Questions about usage of registers (#152) - A wrong function name parallellise (#177)

- Julia
Published by github-actions[bot] almost 2 years ago

gemmkernels.jl - v0.1.0

GemmKernels v0.1.0

Closed issues: - Unable to add the Package (#48) - Migrate GPU CI from GitLab to Buildkite? (#54) - Submit code coverage information to Codecov (#55) - Warning: Performing scalar operations on GPU arrays when running the test suite (#62)

Merged pull requests: - Add Tiling API (#1) (@thomasfaingnaert) - Add matmul API (#2) (@thomasfaingnaert) - Add compat entries (#8) (@thomasfaingnaert) - Add benchmarks for WMMA (#9) (@thomasfaingnaert) - Rename Layout.size() to Layout.physicalsize() (#10) (@thomasfaingnaert) - Add row major layout (#11) (@thomasfaingnaert) - Add scaling to WMMA benchmarks (#12) (@thomasfaingnaert) - Add instructions for benchmarking (#13) (@thomasfaingnaert) - Use correct layouts in benchmarks (#14) (@thomasfaingnaert) - Change iteration order for transposed matrices (#15) (@thomasfaingnaert) - Fix padding for transposed layouts (#17) (@thomasfaingnaert) - Fix label mixed-precision GEMM tests (#18) (@thomasfaingnaert) - Add test to conjugate complex matrix (#19) (@thomasfaingnaert) - Support row major layouts for complex matrix multiplication (#20) (@thomasfaingnaert) - Add benchmarks for complex GEMM (#21) (@thomasfaingnaert) - Fix padded layouts for complex GEMM (#22) (@thomasfaingnaert) - Add epilogue to add a bias vector to the matrix output (#23) (@thomasfaingnaert) - Specialise GEMM kernel for diagonal matrices (#24) (@thomasfaingnaert) - Add BLAS API (#25) (@thomasfaingnaert) - Extend BLAS interface to Diagonal matrices (#26) (@thomasfaingnaert) - Add workaround for FP16 multiplication (#27) (@thomasfaingnaert) - Use BLAS interface for WMMA benchmark (#28) (@thomasfaingnaert) - Fuse multiply and divide of C fragment (#29) (@thomasfaingnaert) - Adapt to updated CuArray type (#30) (@thomasfaingnaert) - Fix default value of 'computewarp' parameter (#31) (@thomasfaingnaert) - Clean up Tiling API (#32) (@thomasfaingnaert) - Update manifest (#33) (@github-actions[bot]) - Adapt tiling tests to new translate name (#34) (@thomasfaingnaert) - Update dependencies (#35) (@thomasfaingnaert) - Allow using binary build in profile script (#36) (@thomasfaingnaert) - Add all benchmarks (#37) (@thomasfaingnaert) - Software pipelining (#38) (@thomasfaingnaert) - Testsuite changes (#39) (@thomasfaingnaert) - Expand testsuite (#40) (@thomasfaingnaert) - Add CI via GitLab (#41) (@thomasfaingnaert) - Add commit filter for CI (#43) (@thomasfaingnaert) - Fix commit filter (#44) (@thomasfaingnaert) - Expand README (#45) (@thomasfaingnaert) - Update dependencies (#49) (@thomasfaingnaert) - Test larger complex and dual matrices (#50) (@thomasfaingnaert) - Move to BuildKite. (#56) (@maleadt) - README: Buildkite badge should show status of the master branch; Codecov badge does not require token (#58) (@DilumAluthge) - Run Buildkite on both Julia 1.5 and Julia 1.6-nightly (#60) (@DilumAluthge) - Print some version information at the beginning of the test suite (#61) (@DilumAluthge) - Set CUDA.allowscalar(false) at the beginning of the test suite (#63) (@DilumAluthge) - Avoid scalar iteration in testsuite (#65) (@thomasfaingnaert) - Run Buildkite on Julia 1.5, 1.6-nightly, and nightly (#66) (@DilumAluthge)

- Julia
Published by github-actions[bot] about 5 years ago