Recent Releases of Metal

Metal - v1.7.0

Metal v1.7.0

Diff since v1.6.4

Merged pull requests: - Add function to retrieve # of gpu cores in system (#626) (@christiangnrd) - Support KA unified memory (#630) (@christiangnrd) - Add GPUToolbox 0.3 compat (#639) (@christiangnrd) - Return the old value from atomicfetchop_explicit. (#640) (@maleadt)

- Julia
Published by github-actions[bot] 7 months ago

Metal - v1.6.4

Metal v1.6.4

Diff since v1.6.3

Merged pull requests: - typo in MPSMatrixMultiplication comment (#622) (@jandrej) - Remove unnecessary OS signposts (#623) (@christiangnrd) - Accept alternate filename as optional argument (#629) (@christiangnrd) - Support Float32 threadgroup atomics by bitcasting. (#636) (@maleadt)

Closed issues: - @signpost_events make the code awfully slow (#621)

- Julia
Published by github-actions[bot] 7 months ago

Metal - v1.6.3

Metal v1.6.3

Diff since v1.6.2

Merged pull requests: - More accumulation and reduction benchmarks (#614) (@christiangnrd) - Remove the unnecessary reshape during mapreduce (#615) (@christiangnrd) - Synchronize resources before cpu access of ManagedStorage resource (#617) (@christiangnrd) - Fix linalg tests for MPS and MPSGraph (#618) (@christiangnrd) - Don't warn on macOS 26 and bump version (#620) (@christiangnrd)

- Julia
Published by github-actions[bot] 7 months ago

Metal - v1.6.2

Metal v1.6.2

Diff since v1.6.1

Merged pull requests: - Handle broadcasting when storage types are different (#605) (@limarta) - Add JLD2 to test env (#606) (@christiangnrd) - Tahoe versions (#607) (@christiangnrd) - Add MemoryFlagDevice to KA.jl's synchronization primitive. (#609) (@maleadt) - Update wrappers (#610) (@christiangnrd) - Bump version (#611) (@christiangnrd)

Closed issues: - KA.@synchronize -- threadgroup_barrier semantics (#608)

- Julia
Published by github-actions[bot] 8 months ago

Metal - v1.6.1

Metal v1.6.1

Diff since v1.6.0

Merged pull requests: - Adding definition for KA.functional (#598) (@astrozot) - Update requirements (#599) (@christiangnrd) - Fix findall with empty MtlArray of Bool (#601) (@christiangnrd) - [NFC] Typo (#602) (@christiangnrd) - Add bare minimum for macOS 26 Tahoe (#604) (@christiangnrd)

Closed issues: - Warnings when precompiling Metal with Julia 1.12 (#594)

- Julia
Published by github-actions[bot] 9 months ago

Metal - v1.6.0

Metal v1.6.0

Diff since v1.5.1

  • Metal, MPS, and MPSGraph frameworks’ enums and objects are now automatically wrapped with Clang.jl
  • Initial MPSGraph support. Currently used to replace MPS matrix multiplication on configurations where the previous method could fail (#381)
  • Many more improvements and bug fixes

Merged pull requests: - Add error checking to command buffer completion handler (#521) (@vovw) - Enable add_functions! test under shader validation (#522) (@christiangnrd) - Update wrapping readme (#525) (@christiangnrd) - Automatically wrap Metal and MPS headers (feat. Properties) (#526) (@christiangnrd) - Objective-C Availability support (#527) (@christiangnrd) - Small followup to #526 (#528) (@christiangnrd) - Add nextafter intrinsic (#529) (@christiangnrd) - Improvements to float intrinsics (#531) (@christiangnrd) - Fix Float16 sincos intrinsic (#533) (@christiangnrd) - Fix format suggestion formatting when diff contains "``" (#534) (@christiangnrd) - Fix rewriter for version-gated expressions (#535) (@christiangnrd) - Move BFloat16 code out of extension (#536) (@christiangnrd) - Version-related fixes to MPSNDArray (#537) (@christiangnrd) - Use GPUToolbox.jl (#538) (@christiangnrd) - [tests] Assume compatible system if xcode not installed (#539) (@christiangnrd) - Add .git-blame-ignore-revs and a few other fixes (#541) (@christiangnrd) - Integer & atomic Intrinsics improvements (#544) (@christiangnrd) - [NFC] Indentation consistency (#545) (@christiangnrd) - Update .git-blame-ignore-revs (#546) (@christiangnrd) - Link to issue (#548) (@christiangnrd) - Test both simd shuffle intrinsics. (#553) (@christiangnrd) - Enabling previously disabled test and mark broken (#554) (@christiangnrd) - Support pow with Int exponent (#557) (@christiangnrd) - Fixes and more tests forunsafewrap(#558) (@christiangnrd) -PiandetoFloat32andFloat16` (#559) (@christiangnrd) - Update GPUToolbox compat (#560) (@christiangnrd) - Split up intrinsics tests (#561) (@christiangnrd) - Remove reference to 'dir' in profiling docs (#563) (@christiangnrd) - Ensure synchronization before unsafewrapping of shared gpu array (#564) (@christiangnrd) - [NFC] Move linear algebra wrappers out of MPS lib (#565) (@christiangnrd) - Initial MPSGraph support (#566) (@christiangnrd) - Remove copy when possible from cpu rnad using GPU RNG (#568) (@christiangnrd) - Code coverage and misc fixes (#569) (@christiangnrd) - Remove obsolete MtlLargerDeviceArray (#574) (@christiangnrd) - Silence warning on 1.12+ (#575) (@christiangnrd) - 15.4 SDK changes (#579) (@christiangnrd) - Faster matmul sometimes (#580) (@christiangnrd) - Fix erf and a few other improvements (#582) (@christiangnrd) - Use an appropriate amount of threads in unified memory example (#583) (@christiangnrd) - Clean up test imports (#584) (@christiangnrd) - Fix some type ambiguities (#585) (@christiangnrd) - Fix findall output type (#587) (@christiangnrd) - Fix tests for macOS 13 (#591) (@christiangnrd) - Minor findall and accumulate tests improvements (#592) (@christiangnrd) - Bump version (#595) (@christiangnrd)

Closed issues: - API Validation failures (#467) - Handle MTLCommandBuffer Error Logs (#510) - sincos intrinsic fails to compile with Float16 (#530) - Can't compare Float32 with pi on Metal (#551) - ^(::Float32, ::Integer) uses double precision (#552) - Code coverage broken (#556) - Remove MtlLargerDeviceArray? (#573) - Installation of Metal does not work when DiffEqGPU is also installed. (#577) - findall output always uses default storage mode instead of matching input storage mode (#578) - Qualify definitions of BroadcastStyle (#586) - New Dynamic Dispatch when using ColorTypes (#588) - Release new version of Metal.jl (#593)

- Julia
Published by github-actions[bot] 9 months ago

Metal - v1.5.1

Metal v1.5.1

Diff since v1.5.0

Merged pull requests: - Adapt to minver in ObjectiveC.jl (#513) (@christiangnrd) - Add Runic action to suggest formatting changes. (#517) (@maleadt) - Increase polling interval for benchmark action (#518) (@christiangnrd) - Adapt to GPUArrays.jl changes. (#519) (@maleadt)

Closed issues: - Benchmark CI failures due to too many requests (#516)

- Julia
Published by github-actions[bot] about 1 year ago

Metal - v1.5.0

Metal v1.5.0

Diff since v1.4.2

Metal.jl 1.5 is a relatively minor release, which the most important change being behind the scenes: GPUArrays.jl v11 has switched to KernelAbstractions.jl (#461).

There is also one (technically) breaking change: code_agx and @device_code_agx have been removed (#512) because of the heavy Python dependency, and conflicts with PythonCall.jl. This functionality did not support recent M GPUs anyway, so it is unlikely to affect many users.

Features

  • Improve performance of shared storage copies: #445
  • Add an is_m4 function: #498
  • #499

Bug fixes

  • Fix fill: #496

Merged pull requests: - Add more tests to api validation testing (#447) (@christiangnrd) - Adapt to GPUArrays.jl transition to KernelAbstractions.jl. (#461) (@maleadt) - Switch CI to 1.11. (#462) (@maleadt) - Remove old code and test cleanup (#464) (@christiangnrd) - Adapt to JuliaGPU/GPUArrays.jl#567. (#475) (@maleadt) - Bump LLVM downgrader (#479) (@maleadt) - Store more debug files when encountering compilation errors. (#482) (@maleadt) - Use OncePerProcess in 1.12+ (#483) (@christiangnrd) - Don't run benchmarks from fork (#485) (@christiangnrd) - Still run GH Action when merged (#486) (@christiangnrd) - Bump IR downgrader (#489) (@maleadt) - Move MTL tests and add a few (#491) (@christiangnrd) - Generate MTL and MPS structs and enums with Clang.jl (#492) (@christiangnrd) - Fix copy tests (#493) (@christiangnrd) - Simplify benchmark runner and pipelines (#494) (@maleadt) - Fix global linear indexing (fill!) (#496) (@christiangnrd) - Couple typos and is_m4 function (#498) (@christiangnrd) - Initial support for MPSNDArray (#499) (@christiangnrd) - Tweak benchmark CI job (#501) (@maleadt) - Fix MPSNDArrayDescriptor wrapper (#502) (@christiangnrd) - Metal library parsing: using CodecBzip2 feature to ignore padding. (#504) (@maleadt) - Followup to #492: Enable C function wrapping (#505) (@christiangnrd) - Rerun random tests with chance of false negative once. (#506) (@christiangnrd) - Bump LLVM downgrader (#507) (@maleadt) - Test loading of package on unsupported platforms (#509) (@christiangnrd) - Remove device_code_agx (#512) (@christiangnrd) - Fix typo in random tests (#514) (@christiangnrd) - Fix Documenter failures (#515) (@christiangnrd)

Closed issues: - KernelAbstractions: add Atomix back-end (#218) - @device_code_agx errors when Metal Shader Validation is enabled (#463) - fill broken after KA integration (#466) - Compilation to native code failed: NSError: Undefined symbols (#480) - ObjectiveC.Foundation.NSErrorInstance(ObjectiveC.id{ObjectiveC.Foundation.NSError}(0x000000014cb8bd90)) (#487) - phi-related IR downgrade issue (#488) - Circular dependency when precompiling (#495) - Bad interaction between PyCall and Metal (#500) - Add github actions CI for linux, windows and non-functional macOS to ensure that precompilation and loading works (#508)

- Julia
Published by github-actions[bot] about 1 year ago

Metal - v1.4.2

Metal v1.4.2

Diff since v1.4.1

Merged pull requests: - Fix loading on unsupported platforms (#459) (@christiangnrd)

Closed issues: - Relax package requirements (#22) - [windows:] Metal does not precompile anymore when installation not functional (#457) - [MacOS:] Metal.functional() wrongly returns true despite no GPUs available (#458)

- Julia
Published by github-actions[bot] over 1 year ago

Metal - v1.4.1

Metal v1.4.1

Diff since v1.4.0

Merged pull requests: - Update Readme (#444) (@christiangnrd) - Use CPU copy with SharedStorage (#445) (@christiangnrd) - Disable nightly CI and fix invalid Metal API usage (#448) (@christiangnrd) - Don't report benchmarks on main branch commits (#450) (@christiangnrd) - Fix #451 and a couple other fixes (#452) (@christiangnrd) - Only load BFloat16s extension on Apple systems (#454) (@christiangnrd) - CompatHelper: bump compat for GPUCompiler to 1, (keep existing compat) (#455) (@github-actions[bot])

Closed issues: - Don't run benchmarks on the master branch? (#449) - unsafe_wrap(Array, ...) of a view does not preserve offset information (#451) - Metal does not load any more without error when installation not functional (#453)

- Julia
Published by github-actions[bot] over 1 year ago

Metal - v1.4.0

Metal v1.4.0

Diff since v1.3.0

Merged pull requests: - Use unified memory for scalar indexing of permutation matrices (#313) (@tgymnich) - Add MPSMatrixRandom (#321) (@christiangnrd) - [.gitignore] Also ignore versioned Manifests (#410) (@christiangnrd) - Remove broken link in Docs (#413) (@christiangnrd) - Remove unused [extras] section in Project.toml (#415) (@christiangnrd) - Small fix and typos (#417) (@christiangnrd) - Add Benchmarking CI (#420) (@christiangnrd) - [NFC] Fix warning in topk docstrings (#421) (@christiangnrd) - Allow initialisation of MTLSize with tuples of different integer types (#425) (@tgymnich) - Add CI for macOS 15 (#426) (@christiangnrd) - Simplify versioninfo() and report more packages. (#429) (@maleadt) - Allow controlling compilation target versions. (#430) (@maleadt) - Add a missing memory fence to a SIMD test. (#432) (@maleadt) - Fix MPS.synchronize_state (#434) (@christiangnrd) - Make lu results have same storage mode as input (#435) (@christiangnrd) - Fix benchmarking CI and benchmark Shared and Private storage modes (#437) (@christiangnrd) - NFC tweak to MPSMatrixCopy tests (#439) (@christiangnrd) - Get more descriptive errors from flaky test (#440) (@christiangnrd)

Closed issues: - Port the opportunistic synchronization from CUDA.jl (#317) - Control flow-related miscompilation: (#401) - More sporadic 1.11 hangs (#412) - Support for LinearAlgebra.kron (#422) - Can't use gemm! methods with Metal (#423) - Error for thread/group size with different integer types (#424) - README example broken (#427) - Intermittent loadstoretg test failure (#428)

- Julia
Published by github-actions[bot] over 1 year ago

Metal - v1.3.0

Metal v1.3.0

Diff since v1.2.0

Merged pull requests: - Fix typo in docs (#384) (@christiangnrd) - Bump minimal Julia requirement to v1.10. (#385) (@maleadt) - Remove Requires dependency (#386) (@christiangnrd) - Reflection: Figure out kernel names by looking at metallib section. (#390) (@maleadt) - Add tests for broadcasting minimum and maximum (#391) (@tgymnich) - Don't export MTL (#392) (@christiangnrd) - Add erfinv (#394) (@tgymnich) - Add expm1 (#395) (@tgymnich) - Cleanup some imports (#398) (@christiangnrd) - Remove type-pirated function (#399) (@christiangnrd) - Unexport some high-level MPS functionality from MPS (#400) (@christiangnrd) - Adapt to new REPL precompile changes (JuliaLang/julia#55210) (#403) (@christiangnrd) - Bump GPUCompiler. (#404) (@maleadt) - Bump LLVM compat (#407) (@maleadt) - Make 1.11 CI success mandatory. (#408) (@maleadt)

Closed issues: - Audit exports/public symbols (#359) - Compilation failure on 1.11 (#370) - MTLBinaryArchive (#387) - Metal.code_agx() failing in MacOS 15 Beta 3 (#388) - Test for min / max broadcasting issue (#389) - Type piracy (#396) - Potentially unused code in gpuarrays.jl (#397) - Shared vs SharedStorage in examples/unified_memory (#405) - Unsuported call to an unknown function when calling Distributions (#406)

- Julia
Published by github-actions[bot] over 1 year ago

Metal - v1.2.0

Metal v1.2.0

Diff since v1.1.0

Merged pull requests: - Avoid constructing MulAddMuls on Julia v1.12+ (#295) (@dkarrasch) - Trigger the runtime profiler when a test times out. (#330) (@maleadt) - Add MPSMatrixSoftMax (#333) (@christiangnrd) - Reorganize and add some MPS tests (#335) (@christiangnrd) - Typo fix (#336) (#337) (@101001000) - Add error message for running Metal.jl under Rosetta (#339) (@tgymnich) - Add MPSCommandBuffer (#340) (@christiangnrd) - Bump julia-actions/setup-julia from 1 to 2 (#341) (@dependabot[bot]) - Revert error message for Rosetta (#342) (@tgymnich) - Update to ObjectiveC.jl v3. (#343) (@maleadt) - Add autoreleasepools to MPS interface methods. (#344) (@maleadt) - Don't redundantly return the cmdbuf from commit methods. (#345) (@maleadt) - Whitespace fixes (#346) (@christiangnrd) - CompatHelper: bump compat for LLVM to 7, (keep existing compat) (#347) (@github-actions[bot]) - CompatHelper: add new compat entry for SpecialFunctions in [weakdeps] at version 2, (keep existing compat) (#352) (@github-actions[bot]) - [NFC] Fix indentation (#353) (@christiangnrd) - Bump LLVM downgrader (#354) (@maleadt) - Don't export non-existent contents (#356) (@christiangnrd) - Remove/fix unused exports (#357) (@christiangnrd) - Unexport SimpleVersion and AS (#360) (@christiangnrd) - Add support for opaque pointers (#361) (@maleadt) - Docstrings (#362) (@christiangnrd) - Initial MacOS 15 support (#365) (@christiangnrd) - Replace current_device() with device() (#366) (@christiangnrd) - Support reading metallib v1.2.8 files from macOS 15. (#367) (@maleadt) - Add metallib (dis)assembly helper scripts. (#368) (@maleadt) - Simplify testing of examples. (#369) (@maleadt) - Temporarily allow 1.11 to fail. (#371) (@maleadt) - CompatHelper: add new compat entry for PrecompileTools at version 1, (keep existing compat) (#372) (@github-actions[bot]) - Define complex sqrt (#374) (@mtfishman) - Check the macOS version during initialization. (#375) (@maleadt) - CompatHelper: bump compat for LLVM to 8, (keep existing compat) (#376) (@github-actions[bot]) - Add accumulate implementation (#377) (@chengchingwen) - fix derived device array (#378) (@chengchingwen) - avoid ReshapedArray using Int128 in metal kernel (#379) (@chengchingwen) - improve type stability of derived array (#380) (@chengchingwen) - add findall implementation (#382) (@zhenwu0728) - Bump version (#383) (@christiangnrd)

Closed issues: - Tests sporadically timing out on 1.11 (#329) - ReshapedArray indexing broken because of Int128 operation (#332) - KernelAbstractions copyto! typo (#336) - Segmentation Faults (#338) - Port accmulate! and findall from CUDA.jl (#348) - Tests failing with GPUCompiler v0.26.5 and LLVM v7.1 (#350) - downgrades LLVM (#355) - sqrt(::Complex) unsupported due to conversion exceptions (#364)

- Julia
Published by github-actions[bot] over 1 year ago

Metal - v1.1.0

Metal v1.1.0

Diff since v1.0.0

Merged pull requests: - Add resize! (#279) (@mtfishman) - Initial MTLTexture support (#280) (@christiangnrd) - Avoid redundant pointer conversions for threadgroup memory. (#283) (@maleadt) - Re-implement metallib generation in Julia. (#284) (@maleadt) - CompatHelper: add new compat entry for SHA at version 0.7, (keep existing compat) (#286) (@github-actions[bot]) - Support more of the metallib format (#288) (@maleadt) - Address potentiallly buggy mtl behaviour. (#290) (@christiangnrd) - CompatHelper: add new compat entry for CodecBzip2 at version 0.8, (keep existing compat) (#292) (@github-actions[bot]) - Remove an unneeded pointer method. (#293) (@maleadt) - Use NSAutoreleasePool to clean up memory. (#294) (@maleadt) - adapt_storage-related improvements (#296) (@christiangnrd) - CompatHelper: bump compat for ObjectiveC to 2, (keep existing compat) (#297) (@github-actions[bot]) - Add support for signposts (#300) (@maleadt) - Retain NSError we rethrow to avoid an UAF. (#302) (@maleadt) - Minor mapreduce improvements (#303) (@maleadt) - Specialize broadcast to avoid integer divisions. (#304) (@maleadt) - Better Support for Unified Memory (#305) (@tgymnich) - Add 1.11 CI (#306) (@christiangnrd) - Remove unused files (#307) (@tgymnich) - Skip profiling tests on macOS 14.4/M1. (#310) (@maleadt) - Increase test timeout limit to accomodate 1.8 (#311) (@christiangnrd) - Test all storage modes (#314) (@christiangnrd) - Fix doctests (#315) (@christiangnrd) - Fix KernelAbstractions for Unified Memory (#316) (@tgymnich) - CompatHelper: add new compat entry for Preferences at version 1, (keep existing compat) (#318) (@github-actions[bot]) - Minor cleanup (#319) (@christiangnrd) - Create MtlArray using memory allocated by Array (#320) (@christiangnrd) - Re-enable profiling tests on M1/14.4 when using Xcode 15.3. (#322) (@maleadt) - Small typo and doc fixup (#325) (@christiangnrd) - BFloat16s.jl extension and related improvements (#326) (@christiangnrd) - Support for Julia 1.11 (#327) (@maleadt)

Closed issues: - Validation-related back-end crash on macOS Ventura (#34) - slow broadcast copy in 2D (#41) - Poor performance of mapreduce (#46) - Multiplication with SubArrays (#47) - Add support to creating MtlArray using a memory allocated by Array (#62) - Improve use of unified memory (#86) - Use Autoreleasepools with Metal (#103) - Unknown RFLT tag generated by macOS 13 Metal compiler (#167) - mapreduce allocates a lot on the CPU (#211) - Legalization errors with vectorized code (#257) - Compilation Failure due to undefined symbols (#276) - resize!, append! not defined (#277) - tag new version (#278) - Panic during profiling tests on 14.4 beta (#281) - M3 backend cannot handle atomics with complicated pointer conversions (#282) - Int128 does not compile (#287) - Two suspicious mtl-related behaviours (#289) - LU factorization: add allowsingular keyword argument (#299) - Autorelease changes lead to use after free with errors (#301) - Reductions don't work on Shared Arrays (#312)

- Julia
Published by github-actions[bot] almost 2 years ago

Metal - v1.0.0

Metal v1.0.0

Diff since v0.5.1

Merged pull requests: - Matrix batches (#158) (@tgymnich) - Add 1.10 CI. (#256) (@maleadt) - Update manifest (#258) (@github-actions[bot]) - CompatHelper: bump compat for GPUCompiler to 0.25, (keep existing compat) (#259) (@github-actions[bot]) - Bump actions/checkout from 3 to 4 (#260) (@dependabot[bot]) - Update manifest (#261) (@github-actions[bot]) - CompatHelper: bump compat for CEnum to 0.5, (keep existing compat) (#262) (@github-actions[bot]) - Update manifest (#263) (@github-actions[bot]) - CompatHelper: add new compat entry for Artifacts at version 1, (keep existing compat) (#264) (@github-actions[bot]) - Reduce launch overhead by generating code to encode arguments. (#265) (@maleadt) - Remove unused function argument (#266) (@tgymnich) - Introduce application tracing profiler (#267) (@maleadt) - Remove content(::MTLBuffer), use convert intead. (#268) (@maleadt) - Allow more kwargs syntax with kernel launches (#269) (@maleadt) - Don't re-use the IO object when shelling out to Python. (#271) (@maleadt) - Preserve storage mode when broadcasting. (#273) (@maleadt)

Closed issues: - Support for macOS Sonoma (#201) - Error with Julia 1.10 (#274)

- Julia
Published by github-actions[bot] about 2 years ago

Metal - v0.5.1

Metal v0.5.1

Diff since v0.5.0

Merged pull requests: - MPSMatrix improvements (#157) (@tgymnich) - Update manifest (#221) (@github-actions[bot]) - Update manifest (#222) (@github-actions[bot]) - Update manifest (#224) (@github-actions[bot]) - Update manifest (#227) (@github-actions[bot]) - CompatHelper: bump compat for ObjectiveC to 1, (keep existing compat) (#228) (@github-actions[bot]) - Update manifest (#230) (@github-actions[bot]) - Fix argument types in sincos (#232) (@fjebaker) - Update manifest (#233) (@github-actions[bot]) - Improve docs (#235) (@christiangnrd) - Remove linear algebra section of MPS docs (#237) (@christiangnrd) - CompatHelper: bump compat for GPUCompiler to 0.22, (keep existing compat) (#238) (@github-actions[bot]) - Port openlibm log1pf as log1p (#239) (@sotlampr) - Port openlibm erf (#240) (@tgymnich) - Remove 1.6-era override mechanism. (#241) (@maleadt) - CompatHelper: add new compat entry for Requires at version 1, (keep existing compat) (#242) (@github-actions[bot]) - Update manifest (#243) (@github-actions[bot]) - enable dependabot for GitHub actions (#244) (@ranocha) - Bump actions/checkout from 2 to 3 (#245) (@dependabot[bot]) - Bump peter-evans/create-pull-request from 3 to 5 (#246) (@dependabot[bot]) - Show METAL_CAPTURE_ENABLED in Metal.versioninfo() when the environment variable is set (#248) (@christiangnrd) - Update manifest (#249) (@github-actions[bot]) - Adapt to GPUCompiler.jl, and other small updates. (#250) (@maleadt) - Switch to GPUArrays buffer management. (#251) (@maleadt) - Update manifest (#252) (@github-actions[bot]) - Update manifest (#253) (@github-actions[bot]) - Bump GPUCompiler (#255) (@maleadt)

Closed issues: - Random access indexing into MtlArray views cause scalar indexing (#149) - Q: How to debug kernels - KA.@print? (#223) - Crash during MTLDispatchListApply (#225) - Unable to compile trig functions through ForwardDiff (#229) - symbol multiply defined! Bug/crash on Julia master, fine on 1.10 (#231) - log1p fails on MtlArray{Float32} (#234) - When precompiling, UndefVarError: CompilerConfig not defined (#247)

- Julia
Published by github-actions[bot] over 2 years ago

Metal - v0.5.0

Metal v0.5.0

Diff since v0.4.1

Metal.jl 0.5 is a feature release, bringing initial support for atomic operations (#168). Low-level atomics that mimic Metal C are supported (atomic_store_explicit, atomic_load_explicit, etc), as well as a higher-level Metal.@atomic that can be used to update array values similar to how CUDA.jl's @atomic works. This uses native atomics when supported, and falls back to a compare-exchange loop otherwise.

Minor changes include an update for the @device_code_agx disassembler, the addition of a type variable to MtlArray encoding the storage mode (#194), and support for MPSVector (#199) which should accelerate matrix/vector multiplications.

Also note that Metal.jl now disallows the construction of Float64 arrays, as these are not support by the Metal libraries.

Closed issues: - Support for atomics (#79) - Make MtlArray storage mode a type parameter (#190) - Long stacktrace when trying to create Float64 rand arrays (#205) - allowscalar equivalent for Metal.jl (#206) - Define map! ? (#219)

Merged pull requests: - Implement atomics using compiler intrinsics (#168) (@maleadt) - Parameterize MtlArray storage mode (#194) (@christiangnrd) - Implement MPSVector (#199) (@tgymnich) - Update manifest (#200) (@github-actions[bot]) - Add Metal 3.1 to MTLLanguageVersion (#202) (@christiangnrd) - Update manifest (#203) (@github-actions[bot]) - CompatHelper: bump compat for GPUCompiler to 0.21, (keep existing compat) (#204) (@github-actions[bot]) - Update manifest (#207) (@github-actions[bot]) - Disallow Float64 arrays entirely. (#209) (@maleadt) - Adapt to LLVM.jl 6. (#213) (@maleadt) - Update manifest (#215) (@github-actions[bot]) - Bump disassembler. (#216) (@maleadt)

- Julia
Published by github-actions[bot] over 2 years ago

Metal - v0.4.1

Metal v0.4.1

Diff since v0.4.0

Closed issues: - Command buffer callbacks can cause bus error during thread adoption (#138) - how to set up Project.toml (#185) - Metal.rand() creates a CPU array (#187) - fill! for Int8 errors when the value is negative (#192)

Merged pull requests: - Refactor matmatmul code for faster load time (#186) (@dkarrasch) - Add *.DS_Store to .gitignore (#188) (@christiangnrd) - Add GPUArrays out-of-place random methods (#189) (@tgymnich) - Revert "Don't rely on thread adoption for command buffer callbacks." (#191) (@maleadt) - Fix fill! with negative Int8 values (#193) (@christiangnrd) - disambiguate gemm_wrapper! with LinAlg.jl (#195) (@dkarrasch) - Add type annotations for character args in matmatmul (#196) (@dkarrasch) - Handle missing adjoint case. (#197) (@maleadt) - Fix transposed matmul. (#198) (@maleadt)

- Julia
Published by github-actions[bot] over 2 years ago

Metal - v0.4.0

Metal v0.4.0

Diff since v0.3.0

Closed issues: - Restore mtlcall (#17) - mapreduce has poor performance (#87) - Native code reflection (#95) - rand! with Bools sometimes fails in tests in 1.9 (#141) - LLVM assertion failures (#153) - Time macro similar to CUDA.@time (#160) - bug in rand!? (#162) - Why not support threadIdx().x, blockIdx().x, blockDim().x etc? (#163) - Incorrect(?) darwin version in 1.8 with Metal.versioninfo() (#179)

Merged pull requests: - Add native code reflection. (#96) (@maleadt) - Move MPSKernels into a dedicated file (#155) (@tgymnich) - [LU decomposition] Fix types (#156) (@tgymnich) - Update manifest (#161) (@github-actions[bot]) - Implement Time macro (#164) (@christiangnrd) - Fix some references to CUDA (#165) (@christiangnrd) - Fix GPUArrays RNG interface implementation. (#166) (@maleadt) - Bump the LLVM back-end. (#169) (@maleadt) - Update manifest (#170) (@github-actions[bot]) - Update manifest (#171) (@github-actions[bot]) - Update manifest (#172) (@github-actions[bot]) - Bump GPUCompiler to v0.20 (#173) (@christiangnrd) - Detect mapreduce threadgroup limits instead of guessing. (#176) (@maleadt) - Remove reference to no longer used library in README.md (#177) (@christiangnrd) - Report package versions as part of versioninfo() (#180) (@christiangnrd) - Fix Darwin version indentification (#181) (@christiangnrd) - Topk for MPSMatrix (#182) (@christiangnrd) - Update manifest (#183) (@github-actions[bot]) - Don't rely on thread adoption for command buffer callbacks. (#184) (@maleadt)

- Julia
Published by github-actions[bot] almost 3 years ago

Metal - v0.3.0

Metal v0.3.0

Diff since v0.2.0

Closed issues: - Migrate to metal C++? (#2) - Improved errors when calling device functions on CPU (#90) - Improve Objective-C interfacing (#104) - Rename grid to groups (#116) - Add functionality check helper (#121) - inputing non-isbits types (#128) - @metal docstring out-of-date (#129) - mapreduce kernel uses too many threads (#132) - Powers don't work with complex floats (#142)

Merged pull requests: - Add contributing documentation (#93) (@max-Hawkins) - Reduce multiple consecutive values in each thread to improve efficiency (#112) (@maxwindiff) - Remove libcmt, use native ObjectiveC FFI (#117) (@maleadt) - Rename grid to groups (#119) (@habemus-papadum) - Audit MRR (#122) (@maleadt) - Faster in-place reduction by using broadcasting to initialize partial… (#123) (@maxwindiff) - Add MPS matrix decompositions (#124) (@tgymnich) - Minor documentation formatting (#125) (@asinghvi17) - Switch default mode to private storage (#126) (@christiangnrd) - Update manifest (#127) (@github-actions[bot]) - Add some MtlArray docs (#130) (@christiangnrd) - Port MetalKernels (#131) (@maxwindiff) - Adapt to GPUCompiler 0.18. (#134) (@maleadt) - Support passing non-isbits arguments, as long as they're unused. (#135) (@maleadt) - Do not change grain size after pipeline creation (#136) (@maxwindiff) - Bump GPUArrays. (#137) (@maleadt) - Specialize GPUArrays' globalsize query. (#139) (@maleadt) - Catch errors that happen during command buffer callbacks. (#140) (@maleadt) - Call the correct currentdevice() in reflection (#143) (@maxwindiff) - Error when calling device functions on CPU (#144) (@christiangnrd) - Implement MTLGPUFamily and use it to validate gpu (#146) (@christiangnrd) - Add functional() (#147) (@christiangnrd) - Update manifest (#148) (@github-actions[bot]) - CompatHelper: add new compat entry for StaticArrays at version 1, (keep existing compat) (#151) (@github-actions[bot]) - Update to LLVM.jl 5 and GPUCompiler 0.19. (#154) (@maleadt)

- Julia
Published by github-actions[bot] almost 3 years ago

Metal - v0.2.0

Metal v0.2.0

Diff since v0.1.2

Closed issues: - Threadgroup memory breaks on small datatypes (#26) - Int64 not supported on AMD GPUs? (#38) - Base.unsafe_convert is ambiguous (#42) - Support for multiple devices (#44) - Add CITATION file (#55) - XGBoost on Metal.jl (#82) - first try at metal (#84) - Copysign intrinsic possibly wrong (#89) - Metal.jl fails to precompile on Linux (#97) - Silent failure with unsupported(?) Intel Iris Graphics (#109) - I have 2 question about Metal.jl and Flux.jl (#110)

Merged pull requests: - Update manifest (#57) (@github-actions[bot]) - Add GPU profiling capabilities (#58) (@max-Hawkins) - Automatically detect if we need cmt build from source. (#59) (@maleadt) - Update manifest (#60) (@github-actions[bot]) - Add queue kernel launch argument (#61) (@tgymnich) - Update manifest (#63) (@github-actions[bot]) - Switch pipeline to juliaecosystem (#64) (@vchuravy) - Update manifest (#65) (@github-actions[bot]) - Add a function for setting the current device (#66) (@maxwindiff) - Add documentation webpage (#67) (@max-Hawkins) - Wrap simdgroup matrix functions (#70) (@maxwindiff) - Support loading/saving simdgroup matrix from threadgroup memory (#71) (@maxwindiff) - Conditionalize the MtlDeviceArray element-type workaround. (#72) (@maleadt) - Add basic SIMD shuffle up/down (#73) (@max-Hawkins) - Update manifest (#74) (@github-actions[bot]) - Optimize warp reduction for mapreduce (#75) (@max-Hawkins) - Specialize GPUArrays.globalindex() to improve broadcast performance (#76) (@maxwindiff) - Update manifest (#78) (@github-actions[bot]) - Add initial performance shader support (matmul) (#80) (@max-Hawkins) - Use Ninja to build cmt. (#81) (@maleadt) - Update manifest (#83) (@github-actions[bot]) - Support Julia 1.9 (#85) (@maleadt) - Add queue parameter to unsafecopyto (#88) (@tgymnich) - Update manifest (#91) (@github-actions[bot]) - Add MPS tests. (#92) (@maleadt) - Support for writing binary archives (#94) (@maleadt) - Support precompilation and loading on non-Apple hardware (#98) (@maleadt) - Update manifest (#99) (@github-actions[bot]) - Improve reduce performance by passing CartesianIndices and length statically (#100) (@maxwindiff) - Do not release objects that are autoreleased. (#102) (@habemus-papadum) - Fix path the cmt in Hacking Section of the Readme (#105) (@habemus-papadum) - Add example showing Metal and Gtk4 integration (#106) (@habemus-papadum) - Fix memory leak. (#107) (@habemus-papadum) - Add a mtl function for simple recursive data conversions. (#114) (@maleadt) - Write profile trace in the current folder. (#115) (@maleadt)

- Julia
Published by github-actions[bot] almost 3 years ago

Metal - v0.1.2

Metal v0.1.2

Diff since v0.1.1

Closed issues: - installation issue (libz.1.dylib not found) +workaround - Optimally choosing threads and grid (#54)

Merged pull requests: - Use Base.active_project. (#43) (@maleadt) - Update manifest (#45) (@github-actions[bot]) - Add aliases MtlVector and MtlMatrix (#48) (@amontoison) - Update manifest (#49) (@github-actions[bot]) - Wrap at-metal's output in a let block. (#50) (@maleadt) - Update manifest (#52) (@github-actions[bot]) - Update manifest (#56) (@github-actions[bot])

- Julia
Published by github-actions[bot] over 3 years ago

Metal - v0.1.1

Metal v0.1.1

Diff since v0.1.0

Closed issues: - Super slow broadcast (#39)

Merged pull requests: - Fix typos in unified memory example (#37) (@pitmonticone) - Fix the launch heuristic. (#40) (@maleadt)

- Julia
Published by github-actions[bot] over 3 years ago

Metal - v0.1.0

Metal v0.1.0

Diff since v0.0.1

- Julia
Published by github-actions[bot] over 3 years ago

Metal - v0.0.1

Metal v0.0.1

Closed issues: - error when using (#1) - Argument buffer encoding is fragile (#5) - LLVMType of MtlDeviceArray needs changing/manipulation (#6) - Errors running on M1 Max (#14) - I get this, my name isn't Tim (#16) - Thanks for the previous fix - had a go (#18) - Custom IR verification (#25) - cmt: Release build fails install (#27)

Merged pull requests: - Add devicecodemetallib macro (#3) (@max-Hawkins) - Update README (#8) (@max-Hawkins) - Implement GPUArrays launch heuristic (#9) (@max-Hawkins) - Add docstrings (#12) (@max-Hawkins) - Rework metadata generation (#13) (@maleadt) - Add CI (#19) (@maleadt) - Use sw_vers to query the macOS version. (#20) (@maleadt) - Updates for macOS 13 (Ventura); use bindless argument buffers (#23) (@maleadt) - Enable the GPUArrays test suite (#24) (@maleadt) - Use cmt from pre-built JLL. (#28) (@maleadt) - Package updates (#29) (@maleadt) - First test with a locally-built cmt. (#30) (@maleadt) - Use labels to determine whether to build local deps. (#31) (@maleadt) - Bump GPUArrays. (#32) (@maleadt) - MTL wrapper clean-ups (#33) (@maleadt)

- Julia
Published by github-actions[bot] over 3 years ago