Recent Releases of CUDAKernels

CUDAKernels - v0.9.38

KernelAbstractions v0.9.38

Diff since v0.9.37

Feature changes

  • Add API support for unified memory allocations

Merged pull requests: - [0.9] Unified memory allocations (#632) (@christiangnrd)

- Julia
Published by github-actions[bot] 10 months ago

CUDAKernels - v0.9.37

KernelAbstractions v0.9.37

Diff since v0.9.36

Feature changes

  • Support @kernel definition inside functions

Merged pull requests: - Use stacked method tables (#615) (@vchuravy) - avoid boxing when @kernel is used as a closure (#625) (@simeonschaub)

- Julia
Published by github-actions[bot] 11 months ago

CUDAKernels - v0.9.36

KernelAbstractions v0.9.36

Diff since v0.9.35

Feature changes

  • get_backend support for StaticArrays

Merged pull requests: - Use Printf to report errors from POCL (#592) (@vchuravy) - use unsafeindices for a few examples (#612) (@vchuravy) - Switch to SPIRVIntrinsics 0.3 and the new backend (#614) (@vchuravy) - KA.synchronize, add GLOBALMEMFENCE semantic (#618) (@vchuravy) - add getbackend for StaticArrays (#621) (@vchuravy)

Closed issues: - How to improve CPU performance? (#357)

- Julia
Published by github-actions[bot] 11 months ago

CUDAKernels - v0.9.35

KernelAbstractions v0.9.35

Diff since v0.9.34

Merged pull requests: - Implement a CPU backend using POCL (#556) (@vchuravy) - [0.10] Forbid divergent execution of work-group barriers (#558) (@vchuravy) - Bump julia-actions/setup-julia from 1 to 2 (#561) (@dependabot[bot]) - Switch Format.yml to CUDA.jl style (#568) (@vchuravy) - Test pocl#main on CI (#569) (@vchuravy) - CompatHelper: add new compat entry for SPIRVIntrinsics at version 0.2, (keep existing compat) (#571) (@github-actions[bot]) - CompatHelper: add new compat entry for GPUCompiler at version 1, (keep existing compat) (#572) (@github-actions[bot]) - CompatHelper: add new compat entry for LLVM at version 9, (keep existing compat) (#573) (@github-actions[bot]) - Check that malformed allocations throw and don't stackoverflow (#576) (@vchuravy) - Check that malformed allocations throw and don't stackoverflow (#576) (#577) (@vchuravy) - Avoid callgraph recursion due to exception branch in getglobalid (#579) (@vchuravy) - Remove CPU(static=true) test (#580) (@vchuravy) - Set SPIR-V to 1.2 (#582) (@vchuravy) - use POCL with fixes (#589) (@vchuravy) - use barrier with LOCALMEMFENCE (#591) (@vchuravy) - Test correct backend in examples test (#597) (@christiangnrd) - Switch to pocljll@v7 (#599) (@vchuravy) - prevent `getbackend` from overflowing the stack (#602) (@nsajko) - [NFC] Ignore formatting PRs in blame (#604) (@christiangnrd) - Enable downstream CI for 0.10 (#608) (@vchuravy) - Disable Float16 on the CPU backend (#609) (@vchuravy)

Closed issues: - Exception when getting the index on the CPU backend (#542) - Undefined variable error in kernel after update to 0.9.34 (#575) - Utility for copying array to GPU (#581) - StackOverflowError on get_backend(::UnitRange) (#588) - KA reports undefined variables, when they are in fact defined (#596)

- Julia
Published by github-actions[bot] 12 months ago

CUDAKernels - v0.9.34

KernelAbstractions v0.9.34

Diff since v0.9.33

Merged pull requests: - Bump googleapis/code-suggester from 2 to 4 (#560) (@dependabot[bot]) - Allow opt-out of implicit bounds-checking (#563) (@vchuravy) - [0.9] Forbid divergent execution of work-group barriers (#564) (@vchuravy) - Update Changelog in docs (#565) (@vchuravy) - Fix docs and test for unsafe_indicies=true (#566) (@vchuravy) - Fix indicies->indices typo everywhere (#567) (@vchuravy)

- Julia
Published by github-actions[bot] over 1 year ago

CUDAKernels - v0.9.33

KernelAbstractions v0.9.33

Diff since v0.9.32

Merged pull requests: - Don't overload Base.ndims(::Any) (#557) (@vchuravy)

- Julia
Published by github-actions[bot] over 1 year ago

CUDAKernels - v0.9.32

KernelAbstractions v0.9.32

Diff since v0.9.31

  • Clarify the semantics of KernelAbstractions.copyto! and add KernelAbstractions.pagelock!
  • Add support for multiple devices per backend

Merged pull requests: - Run Runic after explicit return rule addition (#516) (@fredrikekre) - Avoid the exception branch in expand (#518) (@vchuravy) - Allow for ndims query (#551) (@vchuravy) - Switch Runic CI (#552) (@vchuravy) - Update quickstart.md (#553) (@Dale-Black) - support multiple devices per backend (#554) (@vchuravy) - Document the semantics of copyto! and add pagelock! (#555) (@vchuravy)

Closed issues: - Add Feature to Select Devices to Execute Kernels On (#458)

- Julia
Published by github-actions[bot] over 1 year ago

CUDAKernels - v0.9.31

KernelAbstractions v0.9.31

Diff since v0.9.30

Merged pull requests: - Remove unecessary dependencies from KA (#549) (@vchuravy)

- Julia
Published by github-actions[bot] over 1 year ago

CUDAKernels - v0.9.30

KernelAbstractions v0.9.30

Diff since v0.9.29

Merged pull requests: - Add Atomix v1 compat (#545) (@christiangnrd)

- Julia
Published by github-actions[bot] over 1 year ago

CUDAKernels - v0.9.29

KernelAbstractions v0.9.29

Diff since v0.9.28

Merged pull requests: - Allow return statements for GPU-only kernels (#538) (@pxl-th)

Closed issues: - Multi-GPU backend (#540)

- Julia
Published by github-actions[bot] over 1 year ago

CUDAKernels - v0.9.28

KernelAbstractions v0.9.28

Diff since v0.9.27

Merged pull requests: - Enzyme: fix propagation of runtime activity (#534) (@wsmoses) - [CI] Only run Enzyme on v1.10+ (#535) (@wsmoses) - Enzyme support older versions (#537) (@wsmoses)

- Julia
Published by github-actions[bot] over 1 year ago

CUDAKernels - v0.9.27

KernelAbstractions v0.9.27

Diff since v0.9.26

Merged pull requests: - Adapt to pending Enzymecore changes (#519) (@wsmoses)

- Julia
Published by github-actions[bot] over 1 year ago

CUDAKernels - v0.9.26

KernelAbstractions v0.9.26

Diff since v0.9.25

Merged pull requests: - Bump codecov/codecov-action from 3 to 4 (#466) (@dependabot[bot]) - Remove test reliance on high-level features (#520) (@maleadt) - Include OpenCL in reflection test exemptions. (#521) (@maleadt) - Bump julia-actions/cache from 1 to 2 (#522) (@dependabot[bot]) - Bump actions/checkout from 2 to 4 (#523) (@dependabot[bot]) - Bump peter-evans/create-or-update-comment from 3 to 4 (#524) (@dependabot[bot]) - Bump peter-evans/find-comment from 2 to 3 (#525) (@dependabot[bot]) - Switch to julia-actions/cache. (#526) (@maleadt) - Bump actions/upload-artifact from 2 to 4 (#527) (@dependabot[bot]) - Temporarily mark a test as broken on 1.11. (#529) (@maleadt) - Remove unsupported Julia versions from Buildkite tests. (#530) (@maleadt)

Closed issues: - [EnzymeExt] tape_type error (#495)

- Julia
Published by github-actions[bot] over 1 year ago

CUDAKernels - v0.9.25

KernelAbstractions v0.9.25

Diff since v0.9.24

Merged pull requests: - Valid index check for gpu in EnzymeExt (#514) (@jlk9) - Fix zeros erroring on an empty shape (#515) (@nick4f42)

Closed issues: - Can't allocate empty vector anymore with KernelAbstractions.zeros. (#504)

- Julia
Published by github-actions[bot] almost 2 years ago

CUDAKernels - v0.9.24

KernelAbstractions v0.9.24

Diff since v0.9.23

Merged pull requests: - Add runic as a formatter (#505) (@vchuravy) - refactor: move StdLibs into extensions (#508) (@avik-pal) - Rerun runic and lock to specific commit (#511) (@fredrikekre) - Make stdlib extensions backwards compatible (#512) (@vchuravy)

- Julia
Published by github-actions[bot] almost 2 years ago

CUDAKernels - v0.9.23

KernelAbstractions v0.9.23

Diff since v0.9.22

Merged pull requests: - Add GPU reverse mode to EnzymeExt (#454) (@wsmoses) - Fixing transpose function, a missing "end" and some argument orders. (#487) (@Sixzero) - add 1.11 to CI (#491) (@vchuravy) - Improve CPU launch heuristic (#500) (@vchuravy) - use AirspeedVelocity for benchmark CI (#502) (@vchuravy) - Bump UnsafeAtomicsLLVM to 0.2 (#503) (@pxl-th)

Closed issues: - [Enzyme] NTuple index not working on CUDA backend (#494) - External functions in GPU ODE example (#496) - Default CPU workgroupsize can be inadequat for higher-dimensionsal kernels (#499)

- Julia
Published by github-actions[bot] almost 2 years ago

CUDAKernels - v0.9.22

KernelAbstractions v0.9.22

Diff since v0.9.21

Merged pull requests: - Implement KA.functional() (#490) (@vchuravy)

Closed issues: - Printing results changes results in CPU kernel (#485) - KA.functional(::Backend) to mimick CUDA.functional (#489)

- Julia
Published by github-actions[bot] almost 2 years ago

CUDAKernels - v0.9.21

KernelAbstractions v0.9.21

Diff since v0.9.20

Merged pull requests: - Enzyme: simplify via mixedduplicated (#483) (@wsmoses) - Bump version in Project.toml (#484) (@wsmoses)

Closed issues: - Small discrepancy (#482)

- Julia
Published by github-actions[bot] almost 2 years ago

CUDAKernels - v0.9.20

KernelAbstractions v0.9.20

Diff since v0.9.19

Merged pull requests: - Avoid deadlock in EnzymeExt (#478) (@vchuravy)

- Julia
Published by github-actions[bot] almost 2 years ago

CUDAKernels - v0.9.19

KernelAbstractions v0.9.19

Diff since v0.9.18

Merged pull requests: - Add fallback for EnzymeCore.compilerjobfrom_backend (#469) (@vchuravy) - Bump KA test compat for Enzyme (#473) (@wsmoses) - Putting output before input in function arguments in some examples (#475) (@evelyne-ringoot) - Correctly adapt to ABI changes to EnzymeCore (#476) (@wsmoses)

- Julia
Published by github-actions[bot] about 2 years ago

CUDAKernels - v0.9.18

KernelAbstractions v0.9.18

Diff since v0.9.17

Merged pull requests: - Bump EnzymeCore (#467) (@wsmoses)

Closed issues: - MAX support? (#460)

- Julia
Published by github-actions[bot] about 2 years ago

CUDAKernels - v0.9.17

KernelAbstractions v0.9.17

Diff since v0.9.16

Merged pull requests: - Bump actions/cache from 3 to 4 (#455) (@dependabot[bot]) - Bump codecov/codecov-action from 3 to 4 (#456) (@dependabot[bot]) - Revert "Bump codecov/codecov-action from 3 to 4" (#457) (@vchuravy) - Revert "Allow empty return statements" (#463) (@vchuravy)

Closed issues: - CPU version fails with a kernel that contains a return statement (#459) - @index(Global, NTuple) Giving incorrect behavior with CPU() backend (#461)

- Julia
Published by github-actions[bot] over 2 years ago

CUDAKernels - v0.9.16

KernelAbstractions v0.9.16

Diff since v0.9.15

Merged pull requests: - Allow empty return statements (#446) (@eschnett) - Add allocate to docs (#450) (@Vaibhavdixit02) - Replace ROCArrays with AMDGPU in docs (#451) (@pxl-th) - Publish documentation preview (#452) (@pxl-th)

Closed issues: - Allow return or return nothing in kernels (#443) - Replace ROCArrays by AMDGPU in documentation? (#447)

- Julia
Published by github-actions[bot] over 2 years ago

CUDAKernels - v0.9.15

KernelAbstractions v0.9.15

Diff since v0.9.14

Merged pull requests: - Test AMDGPU on 1.10 and limit 1.9 to gfx1030 (#441) (@vchuravy) - CompatHelper: bump compat for Adapt to 4, (keep existing compat) (#445) (@github-actions[bot])

Closed issues: - Atomix example fails with complex array element type (#444)

- Julia
Published by github-actions[bot] over 2 years ago

CUDAKernels - v0.9.14

KernelAbstractions v0.9.14

Diff since v0.9.13

Merged pull requests: - [EnzymeExt] handle active vars to kernels (#439) (@wsmoses) - Add simple example with Atomix.jl (#440) (@roflmaostc)

Closed issues: - CUDADevice not defined (#296)

- Julia
Published by github-actions[bot] over 2 years ago

CUDAKernels - v0.9.13

KernelAbstractions v0.9.13

Diff since v0.9.12

Merged pull requests: - Add <0.0.1 compat for all stdlibs outside of julia repo (#434) (@sharanry) - Simple benchmarks (#435) (@vchuravy) - EnzymeExt allocate subtape of correct size. (#437) (@michel2323)

- Julia
Published by github-actions[bot] over 2 years ago

CUDAKernels - v0.9.12

KernelAbstractions v0.9.12

Diff since v0.9.11

Bug fixes

  • Fix the new @kernel inbounds=true functionality on the CPU #431

Merged pull requests: - Allow nightly CI to fail so badge stays green (#427) (@Moelf) - Change 'A to B' -> 'B to A' in memcopy descriptions (#428) (@johnbcoughlin) - Fix inbounds codegen for CPU (#431) (@vchuravy) - modifying test kernel for the flag to call (#432) (@leios)

- Julia
Published by github-actions[bot] over 2 years ago

CUDAKernels - v0.9.11

KernelAbstractions v0.9.11

Diff since v0.9.10

Merged pull requests: - adding attempt to force inbounds at the kernel level (#429) (@leios)

- Julia
Published by github-actions[bot] over 2 years ago

CUDAKernels - v0.9.10

KernelAbstractions v0.9.10

Diff since v0.9.9

- Julia
Published by github-actions[bot] over 2 years ago

CUDAKernels - v0.9.9

KernelAbstractions v0.9.9

Diff since v0.9.8

Merged pull requests: - Update the quickstart documentation. (#410) (@GunnarFarneback) - Ensure NTuple index functions are inlined (#414) (@vchuravy) - enable dependabot for GitHub actions (#415) (@ranocha)

Closed issues: - @unroll unrolling the wrong loop (#411)

- Julia
Published by github-actions[bot] over 2 years ago

CUDAKernels - v0.9.8

KernelAbstractions v0.9.8

Diff since v0.9.7

Closed issues: - Reverse CI for AMDGPU (#394)

Merged pull requests: - Add AMDGPU CI (#409) (@pxl-th) - Fix inactive rule to only mark constructor of one type as inactive, n… (#412) (@wsmoses)

- Julia
Published by github-actions[bot] almost 3 years ago

CUDAKernels - v0.9.7

KernelAbstractions v0.9.7

Diff since v0.9.6

Closed issues: - Launch kernels and dependencies (#264)

Merged pull requests: - Enzyme Rules, all functioning (#382) (@wsmoses) - Fix CUDA CI (#406) (@vchuravy) - Update CI pipelines (#407) (@vchuravy)

- Julia
Published by github-actions[bot] almost 3 years ago

CUDAKernels - v0.9.6

KernelAbstractions v0.9.6

Diff since v0.9.5

Merged pull requests: - Don't use Float64 in tests, as some back-ends do not support it. (#402) (@maleadt)

- Julia
Published by github-actions[bot] almost 3 years ago

CUDAKernels - v0.9.5

KernelAbstractions v0.9.5

Diff since v0.9.4

Closed issues: - Defining timing infrastructure that works with events. (#15) - Kernels fail on CPU when waiting on kernels that allocate shared memory (#55) - Use macros in nested functions (#377) - CPU(static=true) option (#387) - Need for @inline when using GPU backend (#392) - KA seems to be broken for CUDA (#400)

Merged pull requests: - Add kernel cpu=false and context accessor (#389) (@vchuravy) - Add reverse CI for oneAPI and AMDGPU (#391) (@vchuravy) - Update readme (#393) (@vchuravy) - Improve clarity of numa_aware example (#397) (@carstenbauer) - Docs: numa aware saxpy example (#398) (@carstenbauer) - Add implementation notes to host functionality (#401) (@vchuravy)

- Julia
Published by github-actions[bot] almost 3 years ago

CUDAKernels - v0.9.4

KernelAbstractions v0.9.4

Diff since v0.9.3

Merged pull requests: - Add CPU(static=true) (#388) (@vchuravy) - Update index.md (#390) (@Ruibin-Liu)

- Julia
Published by github-actions[bot] about 3 years ago

CUDAKernels - v0.9.3

KernelAbstractions v0.9.3

Diff since v0.9.2

Merged pull requests: - Migrate from SnoopPrecompile to PrecompileTools (#386) (@timholy)

- Julia
Published by github-actions[bot] about 3 years ago

CUDAKernels - v0.9.2

KernelAbstractions v0.9.2

Diff since v0.9.1

Closed issues: - Use occupancy API for autotuning (#19) - Allow user to turn off contract (#20) - Assigning ::ROCDevice to ::KA.GPU (#321) - ROCKernels: using queue pool causes performance regression (#344) - KernelAbstractions.jl is blocked to v0.8.6 by CUDAKernels (#380)

Merged pull requests: - CI improvements (#378) (@maleadt) - Add unsafe_free! (#381) (@pxl-th)

- Julia
Published by github-actions[bot] about 3 years ago

CUDAKernels - v0.9.1

KernelAbstractions v0.9.1

Diff since v0.9.0

Closed issues: - Can't run the example in quickstart (#371)

Merged pull requests: - Add Metal to list of excluded backends (#368) (@maxwindiff) - Add queries for atomics and float64 support (#369) (@maxwindiff) - Fix typos (#370) (@tomchor) - Add reverse CI for Metal PR (#372) (@vchuravy) - Update reverse CI for CUDA (#373) (@vchuravy) - Make unit tests skippable (#374) (@maxwindiff) - Update CUDA to master (#375) (@vchuravy)

- Julia
Published by github-actions[bot] about 3 years ago

CUDAKernels - v0.9.0

KernelAbstractions v0.9.0

Diff since v0.8.6

Closed issues: - No speedup on CPU (#322) - Add Metal support (#326)

Merged pull requests: - Start removing event system (#317) (@vchuravy) - Add Metal support (#337) (@tgymnich) - Prefer blocks over threads (#341) (@vchuravy) - ROCKernels: Add occupancy API (#342) (@pxl-th) - [CUDAKernels] add always_inline as device parameter (#343) (@vchuravy) - [CUDAKernels] Update compat (#345) (@vchuravy) - Update CI (#346) (@vchuravy) - ROCKernels: Adapt to AMDGPU changes (#348) (@jpsamaroo) - [ROCKernels] Fix addrspacecast (#349) (@vchuravy) - [ROCKernels] Import LLVM (#352) (@pxl-th) - Update compat for oneAPIKernels.jl (#355) (@utkarsh530) - Bump oneAPI to 1.0 (#356) (@michel2323) - Rename device to backend (#359) (@vchuravy) - Let Event(MtlDevice) actually be a barrier (#360) (@vchuravy) - Fix Metal workgroup size (#361) (@tgymnich) - Update docs (#362) (@vchuravy) - Add optional priority feature (#363) (@vchuravy) - Backends are adaptors (#364) (@vchuravy) - Only skip histogram tests on CPU (#365) (@vchuravy)

- Julia
Published by github-actions[bot] about 3 years ago

CUDAKernels - v0.8.6

KernelAbstractions v0.8.6

Diff since v0.8.5

Closed issues: - Support for single-threaded kernels even when Threads.nthreads() != 1? (#328) - Render issue with Docs admonition? (#332)

Merged pull requests: - Minor fixes in the Docs (#334) (@navidcy) - Fix quickstart (#335) (@vchuravy) - setting ndrange macro to output a Tuple (#336) (@leios)

- Julia
Published by github-actions[bot] over 3 years ago

CUDAKernels - v0.8.5

KernelAbstractions v0.8.5

Diff since v0.8.4

Closed issues: - Add backend lookup function based on input arguments (#229) - Update for CUDA.jl 3.0 (#241)

Merged pull requests: - Fix docstrings rendering (#323) (@navidcy) - fix typo in docs (#327) (@bjarthur) - Add @ndrange (#329) (@vchuravy) - Fix stmt form of at_private (#331) (@vchuravy)

- Julia
Published by github-actions[bot] over 3 years ago

CUDAKernels - v0.7.3

KernelAbstractions v0.7.3

Diff since v0.7.2

Closed issues: - Support atomics (#7) - Add backend lookup function based on input arguments (#229) - Separate Cassette context from CompilerMetadata (#231) - Update for CUDA.jl 3.0 (#241) - Adding a function to get device from array (type)? (#268) - Support for atomics (#276) - CUDA 3.6.3 broke KernelAbstactions. (#280) - Enzyme fails on GPU kernel (#307)

Merged pull requests: - Add function getdevice (#269) (@oschulz) - Synchronize SpecialFunctions compat (#279) (@charleskawczynski) - [CUDAKernels] Avoid Cassette looking at devicefunctions (#281) (@vchuravy) - Simplify CI and drop bors (#283) (@vchuravy) - Excise Cassette (#288) (@vchuravy) - Use Timer to avoid hangs due to error on the device (#291) (@vchuravy) - don't test KernelGradients on nightly for now (#292) (@vchuravy) - WIP: Make CUDA stream from cache match CUDA context (#294) (@christophernhill) - adding atomic support with atomix (#299) (@leios) - using CPU as default for Scratchpad and SharedMemory (#300) (@leios) - CompatHelper: add new compat entry for Atomix at version 0.1, (keep existing compat) (#301) (@github-actions[bot]) - Enzyme v0.10 supports GPU compilation again (#303) (@vchuravy) - Add atomics support for ROCKernels (#304) (@jpsamaroo) - Enable CPU atomics (#305) (@jpsamaroo) - Add 'return nothing' to autodiff (#309) (@pxl-th) - Make examples work by copy-paste into REPL (#310) (@jwscook) - bounding UnsafeAtomics and UnsafeAtomicsLLVM (#311) (@leios) - Add oneAPI back-end. (#312) (@maleadt) - [doc] Some revamping (#314) (@michel2323) - Fix depwarn (#315) (@ChrisRackauckas) - ROCKernels: Update to AMDGPU 0.4 (#316) (@jpsamaroo) - Fix docstrings rendering (#323) (@navidcy) - fix typo in docs (#327) (@bjarthur) - Add @ndrange (#329) (@vchuravy) - Fix stmt form of at_private (#331) (@vchuravy)

- Julia
Published by github-actions[bot] over 3 years ago

CUDAKernels - v0.8.4

KernelAbstractions v0.8.4

Diff since v0.8.3

Merged pull requests: - Make examples work by copy-paste into REPL (#310) (@jwscook) - Add oneAPI back-end. (#312) (@maleadt) - [doc] Some revamping (#314) (@michel2323) - Fix depwarn (#315) (@ChrisRackauckas) - ROCKernels: Update to AMDGPU 0.4 (#316) (@jpsamaroo)

- Julia
Published by github-actions[bot] over 3 years ago

CUDAKernels - v0.8.3

KernelAbstractions v0.8.3

Diff since v0.8.2

Closed issues: - Enzyme fails on GPU kernel (#307)

Merged pull requests: - Add 'return nothing' to autodiff (#309) (@pxl-th) - bounding UnsafeAtomics and UnsafeAtomicsLLVM (#311) (@leios)

- Julia
Published by github-actions[bot] almost 4 years ago

CUDAKernels - v0.8.2

KernelAbstractions v0.8.2

Diff since v0.8.1

Closed issues: - Support atomics (#7) - Support for atomics (#276)

Merged pull requests: - adding atomic support with atomix (#299) (@leios) - CompatHelper: add new compat entry for Atomix at version 0.1, (keep existing compat) (#301) (@github-actions[bot]) - Enzyme v0.10 supports GPU compilation again (#303) (@vchuravy) - Add atomics support for ROCKernels (#304) (@jpsamaroo) - Enable CPU atomics (#305) (@jpsamaroo)

- Julia
Published by github-actions[bot] almost 4 years ago

CUDAKernels - v0.8.1

KernelAbstractions v0.8.1

Diff since v0.8.0

Closed issues: - Separate Cassette context from CompilerMetadata (#231)

Merged pull requests: - WIP: Make CUDA stream from cache match CUDA context (#294) (@christophernhill) - using CPU as default for Scratchpad and SharedMemory (#300) (@leios)

- Julia
Published by github-actions[bot] about 4 years ago

CUDAKernels - v0.8.0

KernelAbstractions v0.8.0

Diff since v0.7.2

Closed issues: - Adding a function to get device from array (type)? (#268) - CUDA 3.6.3 broke KernelAbstactions. (#280) - wait(kernel(...) hangs up on Julia v1.7 (#290)

Merged pull requests: - Add function getdevice (#269) (@oschulz) - Synchronize SpecialFunctions compat (#279) (@charleskawczynski) - [CUDAKernels] Avoid Cassette looking at devicefunctions (#281) (@vchuravy) - Simplify CI and drop bors (#283) (@vchuravy) - Excise Cassette (#288) (@vchuravy) - Use Timer to avoid hangs due to error on the device (#291) (@vchuravy) - don't test KernelGradients on nightly for now (#292) (@vchuravy)

- Julia
Published by github-actions[bot] about 4 years ago

CUDAKernels - v0.7.2

KernelAbstractions v0.7.2

Diff since v0.7.1

Merged pull requests: - CompatHelper: bump compat for "SpecialFunctions" to "2.0" (#278) (@github-actions[bot])

- Julia
Published by github-actions[bot] over 4 years ago

CUDAKernels - v0.7.1

KernelAbstractions v0.7.1

Diff since v0.7.0

Closed issues: - Compilation error with type-converting round functions, e.g., ceil(Int32, 1.2f0) (#254) - invalid syntax on upcoming 1.7 (#260) - ROCKernels run doesn't work for me. (#261) - Convert related kernels errors with CUDAKernels (#265) - Event(ROCDevice()) errors (#267)

Merged pull requests: - add KernelGradients (#255) (@vchuravy) - Add AMD section to a couple of examples (#266) (@ali-ramadhan) - Fix Event for multiple event case (#270) (@vchuravy) - fixing shmem line to match CUDA.jl (#272) (@leios) - fix to allow for converting to Int (#273) (@leios)

- Julia
Published by github-actions[bot] over 4 years ago

CUDAKernels - v0.7.0

KernelAbstractions v0.7.0

Diff since v0.6.3

Merged pull requests: - fix failures on 1.7 (#258) (@simeonschaub)

- Julia
Published by github-actions[bot] almost 5 years ago

CUDAKernels - v0.6.3

KernelAbstractions v0.6.3

Diff since v0.6.2

Merged pull requests: - nix overdub of unsafegetindex (#247) (@vchuravy) - Stop overdubbing CUDA math functions (#249) (@ali-ramadhan) - Backport #249 (#251) (@vchuravy) - Check pow overdubbing (#253) (@vchuravy)

- Julia
Published by github-actions[bot] about 5 years ago

CUDAKernels - v0.6.2

KernelAbstractions v0.6.2

Diff since v0.6.1

Merged pull requests: - Remove metadata from cassette context (#244) (@vchuravy) - nix overdub of unsafegetindex (#246) (@vchuravy)

- Julia
Published by github-actions[bot] about 5 years ago

CUDAKernels - v0.6.1

KernelAbstractions v0.6.1

Diff since v0.6.0

Closed issues: - [wrong repository] "Spills" from adjacent views of ROCVector (#237) - Julia compat entry (#238) - Auto-inbounds and nothing? (#240)

Merged pull requests: - Use hostcall for wait and stream GC (#85) (@vchuravy) - [CUDAKernels] add an implicit sync to kernels with no dependencies (#222) (@vchuravy) - Add some docs for CUDAKernels/ROCKernels (#233) (@jpsamaroo) - Add new overdub for unsafe_getindex to avoid allocating error message in Julia 1.6+ (#236) (@jakebolewski) - update to CUDA 3.0 and use task-local stream (#242) (@vchuravy)

- Julia
Published by github-actions[bot] about 5 years ago

CUDAKernels - v0.6.0

KernelAbstractions v0.6.0

Diff since v0.5.5

Merged pull requests: - Add ROCKernels backend (#209) (@jpsamaroo) - Fix Bors (#230) (@DilumAluthge)

- Julia
Published by github-actions[bot] about 5 years ago

CUDAKernels - v0.5.5

KernelAbstractions v0.5.5

Diff since v0.5.4

Closed issues: - InvalidIRError when using integer powers (#223)

Merged pull requests: - overdub overflowerr_binaryop (#227) (@vchuravy)

- Julia
Published by github-actions[bot] about 5 years ago

CUDAKernels - v0.5.4

KernelAbstractions v0.5.4

Diff since v0.5.3

Closed issues: - Feature request: performant matmul! example (#205) - TODO: code coverage for the /lib/CUDAKernels/src/ folder (#213) - Problem with a simple add vecs for some arrays sizes on CUDA (#221)

Merged pull requests: - Split KernelAbstractions into frontend and backends (#200) (@vchuravy) - More robust test/examples.jl (#207) (@tkf) - performant matmul example for KA (#208) (@mjulian31) - Simplify some of the testing commands (#210) (@DilumAluthge) - Test suite: add a newline at the end of a print statement (#211) (@DilumAluthge) - Simplify more CI commands (#212) (@DilumAluthge) - Codecov: submit coverage for the following directories: "/src", "/lib" (#214) (@DilumAluthge) - Combine tests into parameterized testsuite (#219) (@jpsamaroo) - avoid overdubbing of literal_pow (#224) (@vchuravy)

- Julia
Published by github-actions[bot] about 5 years ago

CUDAKernels - v0.5.3

KernelAbstractions v0.5.3

Diff since v0.5.2

Closed issues: - error when calling kernel on reinterpret(reshape, T, ::CuArray) (#179)

Merged pull requests: - Simplify the Buildkite setup by using version: '1' for the stable job (#204) (@DilumAluthge) - tune CUDA kernels automatically (#206) (@simeonschaub)

- Julia
Published by github-actions[bot] over 5 years ago

CUDAKernels - v0.5.2

KernelAbstractions v0.5.2

Diff since v0.5.1

Merged pull requests: - fix mem = at_private performance (#203) (@vchuravy)

- Julia
Published by github-actions[bot] over 5 years ago

CUDAKernels - v0.5.1

KernelAbstractions v0.5.1

Diff since v0.5.0

Closed issues: - Not compatible with Julia 1.6-DEV yet? (#148) - Buildkite is failing on Julia nightly (#155) - TODO: submit to Codecov from Buildkite (#171) - TODO: Run GHA on Julia 1.5, 1.6-nightly, and nightly (#185) - TODO: use the registered version of CUDA.jl for Buildkite on Julia 1.6-nightly and Julia nightly (#196)

Merged pull requests: - Remove the Buildkite "soft fail" on Julia nightly (#180) (@DilumAluthge) - Run Buildkite on Julia 1.5, Julia 1.6, and Julia nightly (#182) (@DilumAluthge) - Fix an error in the Buildkite pipeline (#183) (@DilumAluthge) - Submit to Codecov from Buildkite (#184) (@DilumAluthge) - README: Buildkite badges for Julia 1.5, Julia 1.6-nightly, and Julia nightly (#186) (@DilumAluthge) - Run GHA on Julia 1.5, 1.6-nightly, and nightly (#187) (@DilumAluthge) - Bors: require that GitHub Actions CI passes on Julia nightly (#188) (@DilumAluthge) - Bors: Require GitHub Actions on Julia 1.6-nightly (#191) (@DilumAluthge) - Bors: Require Buildkite on Julia 1.6-nightly (#192) (@DilumAluthge) - Bors: Require Buildkite on Julia nightly (#195) (@DilumAluthge) - Buildkite: Use the registered versions of CUDA.jl for Buildkite on Julia 1.6-nightly and Julia nightly (#198) (@DilumAluthge)

- Julia
Published by github-actions[bot] over 5 years ago

CUDAKernels - v0.5.0

KernelAbstractions v0.5.0

Diff since v0.4.6

Closed issues: - Private array behavior depends on backend (#142) - Make ScratchArray a subtype of AbstractArray (#174)

Merged pull requests: - Fancy private (#175) (@vchuravy)

- Julia
Published by github-actions[bot] over 5 years ago

CUDAKernels - v0.4.6

KernelAbstractions v0.4.6

Diff since v0.4.5

Closed issues: - Buildkite: report separate statuses for Julia 1.5 and Julia nightly? (#156) - Add Buildkite status badge to the KernelAbstractions.jl README? (#158) - bors.toml: use Buildkite status(es) instead of GitLab statuses? (#159) - How are the docs for this package being built and deployed? (#166)

Merged pull requests: - add Buildkite (#146) (@vchuravy) - CompatHelper: bump compat for "SpecialFunctions" to "1.0" (#149) (@github-actions[bot]) - Add locking to stream GC (#150) (@jpsamaroo) - CompatHelper: bump compat for "StaticArrays" to "1.0" (#152) (@github-actions[bot]) - CompatHelper: bump compat for "SpecialFunctions" to "1.1" (#153) (@github-actions[bot]) - Add CI timeouts. (#154) (@maleadt) - Transition from Travis CI to GitHub Actions CI (#157) (@DilumAluthge) - Add the Buildkite status badges, and fix the Bors config (#160) (@DilumAluthge) - CompatHelper: bump compat for "SpecialFunctions" to "1.2" (#161) (@github-actions[bot]) - README: Add the Bors badge (#164) (@DilumAluthge) - Tell Bors to automatically delete merged branches (#165) (@DilumAluthge) - Deploy the docs to GitHub Pages (#167) (@DilumAluthge) - Update to the latest recommended TagBot workflow (#168) (@DilumAluthge) - CompatHelper: bump compat for "Adapt" to "3.0" (#169) (@github-actions[bot]) - Add Codecov badge to README (#170) (@DilumAluthge) - Fix a typo in the Codecov badge (#172) (@DilumAluthge) - fix on nightly (#177) (@simeonschaub)

- Julia
Published by github-actions[bot] over 5 years ago

CUDAKernels - v0.4.5

KernelAbstractions v0.4.5

Diff since v0.4.4

Merged pull requests: - use Const from CUDA and support 2.0 (#145) (@vchuravy)

- Julia
Published by github-actions[bot] over 5 years ago

CUDAKernels - v0.4.4

KernelAbstractions v0.4.4

Diff since v0.4.3

Closed issues: - What's going wrong when kw.head != :kw in @kacodetyped? (#143)

Merged pull requests: - @kacodetyped interactive mode (#140) (@mjulian31) - Kacodellvm (#141) (@mjulian31) - Code typed fixes (#144) (@mjulian31)

- Julia
Published by github-actions[bot] over 5 years ago

CUDAKernels - v0.4.3

KernelAbstractions v0.4.3

Diff since v0.4.2

Merged pull requests: - Mjulian31 (#137) (@mjulian31)

- Julia
Published by github-actions[bot] over 5 years ago

CUDAKernels - v0.4.2

KernelAbstractions v0.4.2

Diff since v0.4.1

Closed issues: - Repeated execution gives different results with CUDA (#108)

Merged pull requests: - Mjulian31 (#133) (@mjulian31) - Mjulian31 (#136) (@mjulian31)

- Julia
Published by github-actions[bot] over 5 years ago

CUDAKernels - v0.4.1

KernelAbstractions v0.4.1

Diff since v0.4.0

Merged pull requests: - Rewrite expoenent implementation for the CUDA backend with Cassette for Julia 1.5 compatibility (#130) (@jakebolewski)

- Julia
Published by github-actions[bot] over 5 years ago

CUDAKernels - v0.4.0

KernelAbstractions v0.4.0

Diff since v0.3.3

Closed issues: - Precompile failure with CUDA master (#128)

Merged pull requests: - Remove unsafe_wait (#129) (@vchuravy)

- Julia
Published by github-actions[bot] over 5 years ago

CUDAKernels -

- Julia
Published by vchuravy over 5 years ago

CUDAKernels - v0.3.3

KernelAbstractions v0.3.3

Diff since v0.3.2

Merged pull requests: - CompatHelper: bump compat for "CUDA" to "1.3" (#123) (@github-actions[bot]) - CompatHelper: bump compat for "LLVM" to "3.0" (#125) (@github-actions[bot])

- Julia
Published by github-actions[bot] almost 6 years ago

CUDAKernels - v0.3.2

KernelAbstractions v0.3.2

Diff since v0.3.1

Merged pull requests: - support rename of CUDA internals (#122) (@vchuravy)

- Julia
Published by github-actions[bot] almost 6 years ago

CUDAKernels - v0.3.1

KernelAbstractions v0.3.1

Diff since v0.3.0

Closed issues: - Matrix examples fails on Julia 1.4 (#43) - AssertionError from ndrange=() (#107) - Kernel compilation for OffsetArrays fails with KernelError: recursion is currently not supported (#110) - Revise failure with KernelAbstractions (#111)

Merged pull requests: - CompatHelper: add new compat entry for "LLVM" at version "1.5" (#105) (@github-actions[bot]) - add erf + erfc functions (#115) (@simonbyrne) - Shared memory transpose (#116) (@vchuravy) - add docs for localmem, private and uniform macros (#118) (@simonbyrne) - Typo (#120) (@PallHaraldsson)

- Julia
Published by github-actions[bot] almost 6 years ago

CUDAKernels - v0.3.0

KernelAbstractions v0.3.0

Diff since v0.2.6

Merged pull requests: - Update KernelAbstractions to use CUDA 1.0 (#104) (@jakebolewski)

- Julia
Published by github-actions[bot] almost 6 years ago

CUDAKernels - v0.2.6

KernelAbstractions v0.2.6

Diff since v0.2.5

- Julia
Published by github-actions[bot] almost 6 years ago

CUDAKernels - v0.2.5

KernelAbstractions v0.2.5

Diff since v0.2.4

Merged pull requests: - Add SpecialFunctions.gamma to kernel language (#99) (@lcw) - CompatHelper: add new compat entry for "SpecialFunctions" at version "0.10" (#100) (@github-actions[bot]) - Bump version to 0.2.5 (#101) (@simonbyrne)

- Julia
Published by github-actions[bot] almost 6 years ago

CUDAKernels - v0.2.4

KernelAbstractions v0.2.4

Diff since v0.2.3

Closed issues: - LLVM error: Cannot select: 0xca6fd20: f64 = fpow 0x93a9e90 (#89) - Defining the same function twice? Maybe a typo? (#93) - private variable not available after an @synchronize (#95) - @synchronize in an if statement (#96)

Merged pull requests: - fix function name check (#52) (@GiggleLiu) - Remove requires (#92) (@vchuravy) - remove redefinition of kernel (#94) (@vchuravy) - Allow typeof on @private memory (#97) (@jkozdon)

- Julia
Published by github-actions[bot] about 6 years ago

CUDAKernels - v0.2.3

KernelAbstractions v0.2.3

Diff since v0.2.2

Merged pull requests: - allow docstrings on kernels (#87) (@simonbyrne) - test on 1.4 (#88) (@vchuravy)

- Julia
Published by github-actions[bot] about 6 years ago

CUDAKernels - v0.2.2

KernelAbstractions v0.2.2

Diff since v0.2.1

Merged pull requests: - Don't busy wait on the CPU (#84) (@vchuravy) - Allow ndrange to be zero (#86) (@lcw)

- Julia
Published by github-actions[bot] about 6 years ago

CUDAKernels - v0.2.1

KernelAbstractions v0.2.1

Diff since v0.2.0

Closed issues: - Kernel launch overhead when launching lots of small kernels (#75)

Merged pull requests: - CompatHelper: bump compat for "CUDAnative" to "3.0" (#77) (@github-actions[bot]) - Improve launch performance of kernels some more (#82) (@mwarusz)

- Julia
Published by github-actions[bot] about 6 years ago

CUDAKernels - v0.2.0

KernelAbstractions v0.2.0

Diff since v0.1.6

Closed issues: - Trailing return yields wrong CPU code. (#50) - Cannot resolve isbits when using function call to isbits (#79)

Merged pull requests: - Error on return statements inside kernels (#74) (@mwarusz) - Forbid waiting in CUDA on a CPUEvents (#78) (@lcw) - Improve launch performance of kernels (#80) (@vchuravy) - Create MultiEvent from tuple of empty MultiEvents (#81) (@lcw)

- Julia
Published by github-actions[bot] about 6 years ago

CUDAKernels - v0.1.6

KernelAbstractions v0.1.6

Diff since v0.1.5

Closed issues: - Integrate async_copy! into the event system (#40)

Merged pull requests: - Recurse into nested scopes containing synchronize (#70) (@mwarusz) - Implement better error propagation and semaphore (#72) (@vchuravy)

- Julia
Published by github-actions[bot] about 6 years ago

CUDAKernels - v0.1.5

KernelAbstractions v0.1.5

Diff since v0.1.4

Merged pull requests: - Run Event(f) on main thread (#71) (@vchuravy)

- Julia
Published by github-actions[bot] about 6 years ago

CUDAKernels - v0.1.4

KernelAbstractions v0.1.4

Diff since v0.1.3

Merged pull requests: - Add CUDA rewrites for sincos(x) and exp(y) for complex y (#67) (@oschub) - Add Event(f, args) to integrate code using atasync better (#68) (@vchuravy) - asynccopy! fixes (#69) (@lcw)

- Julia
Published by github-actions[bot] about 6 years ago

CUDAKernels - v0.1.3

KernelAbstractions v0.1.3

Diff since v0.1.2

Merged pull requests: - Allow MultiEvent to be created from an Event (#65) (@lcw)

- Julia
Published by github-actions[bot] about 6 years ago

CUDAKernels - v0.1.2

KernelAbstractions v0.1.2

Diff since v0.1.1

Merged pull requests: - Unified printing (#61) (@leios) - add multievents (#62) (@vchuravy) - don't recuse into functions like Base.sin (#63) (@vchuravy) - make at_print work outside KA (#64) (@vchuravy)

- Julia
Published by github-actions[bot] about 6 years ago

CUDAKernels - v0.1.1

KernelAbstractions v0.1.1

Diff since v0.1.0

Merged pull requests: - Fix CUDA waiting on CUDA events (#59) (@lcw)

- Julia
Published by github-actions[bot] about 6 years ago

CUDAKernels - v0.1.0

KernelAbstractions v0.1.0

Closed issues: - Variable live-time counter intuitive on the CPU (#13) - Using Val as kernel argument triggers an assertion (#21) - Performance of naive transpose (#22) - Initialization error (#23) - unroll not defined inside a kernel (#24) - Document that private memory works differently than scratch in GPUifyLoops (#31) - How best sync with the default stream in the CUDA backed? (#46)

Merged pull requests: - Bring up GPU functionality fully (#1) (@vchuravy) - Cleanup docs and remove ScalarCPU (#2) (@vchuravy) - CompatHelper: add new compat entry for "CUDAdrv" at version "5.1" (#4) (@github-actions[bot]) - CompatHelper: add new compat entry for "Requires" at version "1.0" (#5) (@github-actions[bot]) - add stream GC and wait with progress function (#10) (@vchuravy) - Adding a few more examples (#12) (@leios) - Fix and test local memory (#14) (@vchuravy) - implement Const memory for GPU and CPU (#16) (@vchuravy) - Handle type parameters in kernel functions (#25) (@vchuravy) - be less judicous with escape (#26) (@vchuravy) - dont't use nested inits (#27) (@vchuravy) - Blocked iteration (#28) (@vchuravy) - cleanup examples (#29) (@vchuravy) - add group index (#32) (@vchuravy) - Make kernels dispatchable (#33) (@mwarusz) - Use macrotools (#34) (@vchuravy) - add a block syntax for uniform (#35) (@vchuravy) - Fix private memory on the CPU (#36) (@mwarusz) - handle atsynchronize in blocks (#37) (@vchuravy) - add ntuple index type (#38) (@vchuravy) - fix nested unroll macros (#39) (@vchuravy) - Allow CPU and CUDA kernels to wait on each other (#41) (@lcw) - Fix tuple destructuring and bors+travis (#42) (@vchuravy) - Wait for GPU events using synchronize (#45) (@mwarusz) - [WIP] Infrastructure to sync CuDefaultStream() (#47) (@vchuravy) - Allow CPU kernels to depend on default events (#51) (@lcw) - Implement asynccopy! (#53) (@vchuravy) - CompatHelper: bump compat for "CUDAapi" to "4.0" (#56) (@github-actions[bot]) - only create as many tasks as threads and more inference barriers (#57) (@vchuravy) - Ensure that constify doesn't cause arguments to be captured (#58) (@vchuravy)

- Julia
Published by github-actions[bot] about 6 years ago