Recent Releases of dla-future

dla-future - DLA-Future 0.10.0

Changes

  • Added inverse of a triangular matrix. (#1324)
  • Added inverse from Cholesky factor and its C API. (#1324 and #1326)
  • Improved Panel transposed broadcast to be more flexible. (#1325)
    • Note: previous usages without changes provide correct result but they perform an additional tile broadcast. This extra communication can be avoided by shrinking the transposed panel.
  • Refactored Matrix API. (#1321)
    • Constructors changed.
    • Member functions have been updated. Old member functions are still available, but will be deprecated soon.

Bug fixes

  • Fixed behaviour when CUDA archs is not specified for a CUDA build. (#1318)

- C++
Published by rasolca 8 months ago

dla-future - DLA-Future 0.9.0

Changes

  • Automatically free all DLA-Future grids when finalizing the C API. (#1308)
  • Tau factors are stored on GPU memory. (#1304)

Bug fixes

  • Avoid potential stack overflows by transferring work to a new task on the default thread pool before launching GPU work. (#1292)

- C++
Published by RMeli 9 months ago

dla-future - DLA-Future 0.8.0

Changes

  • Renamed tune parameters. (#1270)
    • dlaf:red2band-panel-nworkers becomes dlaf:red2band-panel-num-threads.
    • dlaf:tridiag-rank1-nworkers becomes dlaf:tridiag-rank1-num-threads.

Performance improvements

  • Many improvements to reduction_to_band and its backtransformation. (#1214, #1256, #1263)

Bug fixes

  • Fixed a bug in correctness checking of eigensolver miniapps when computing partial eigenspectrum. (#1283)

- C++
Published by rasolca 10 months ago

dla-future - DLA-Future 0.7.3

Bug fixes

  • Changed C ScaLAPACK API indexing convention to 1-based for partial eigenspectrum. (#1248)

- C++
Published by RMeli about 1 year ago

dla-future - DLA-Future 0.7.1

Bug fixes

  • Fixed a compilation issue with ROCm. (#1241)
  • Fixed missing includes of <complex>. (#1243)

- C++
Published by msimberg about 1 year ago

dla-future - DLA-Future 0.7.0

Changes

  • Added (generalized) eigensolver which computes only a part of the eigenspectrum. (#1194)
  • Norm is now fully asynchronous. (#1221)

Performance improvements

  • Refactored communication to use pika's transform_mpi and polling support. (#1125)
  • Use custom coalescing heuristic for memory pools. (#1183)
  • Added configuration option for number of CUDA streams and cuBLAS/SOLVER handles. (#1222, #1182)
  • Some algorithmic clean-ups and improvements. (#1213, #1219, #1232)

Bug fixes

  • Fixed builds with CUDA and stdexec enabled. (#1188)
  • Work around buggy HIP complex operator overloads. (#1195)
  • Namespace (SCA)LAPACK CMake modules with DLAF to avoid conflicts with other packages. (#1178)

- C++
Published by rasolca about 1 year ago

dla-future - DLA-Future 0.6.0

Changes

  • Renamed ScaLAPACK-like generalized eigensolvers pXsygvx/pXhegvx to pXsygvd/pXhegvd. (#1168)
  • Introduced generalized eigensolver where the matrix B is already factorized. (#1167)

Performance improvements

  • Local eigenvector permutations in the distributed tridiagonal eigensolver are executed directly in GPU memory. (#1118)

Bug fixes

  • Fixed ScaLAPACK detection in CMake for specific uenv cases. (#1159)

- C++
Published by rasolca over 1 year ago

dla-future - DLA-Future 0.5.0

Changes

  • Introduced an option (*) for forcing contiguous GPU communication buffers. (#1096)
  • Introduced an option (*) for enabling GPU aware MPI communication. (#1102)
  • Removed special handling of Intel MKL, as it could lead to broken installations. (#1149)
    • Spack installations: spack will set the correct variables.
    • Manual installations: the user is responsible to correctly set variables (see BUILD.md).

(*) These options are available as spack variants.

Performance improvements

  • Don't communicate in algorithms when using single rank communicators. (#1097)
  • Fixed slow performance of local version of bt_band_to_tridiagonal (#1144)

Bug fixes

  • Implemented a workaround for hipMemcpyDefault 2D memcpys, due to bugs in HIP. (#1106)
  • Miniapps initialize HIP before MPI, as on older Cray MPICH versions initializing HIP after MPI leads to HIP not seeing any devices. (#1090)

- C++
Published by msimberg over 1 year ago

dla-future - DLA-Future 0.4.1

Bug fixes

  • Update project version and export it in CMake. (#1121)

- C++
Published by msimberg over 1 year ago

dla-future - DLA-Future 0.4.0

Changes:

  • Modified CommunicatorGrid to avoid blocking calls to MPI_Comm_dup. It now returns communicator pipelines. (#993)
  • Added support for Intel oneMKL and the intel-oneapi-mkl spack package. (#1073) (*)

Performance improvements:

  • Reduced the size of the matrix-matrix multiplications in the tridiagonal eigensolver to cover only the non deflated part of the eigenvectors. (#951 #967 #996 #997 #998)
  • Introduced stackless threads where appropriate. (#1037)

Bug fixes:

  • Use drop_operation_state to avoid stack overflows. (#1004)

Notes:

(*) At the time of the release the spack spec blaspp~openmp ^intel-oneapi-mkl threads=openmp doesn't build. If you rely on multithreaded BLAS we suggest to use blaspp+openmp ^intel-oneapi-mkl threads=openmp until https://github.com/spack/spack/pull/42087 gets merged.

- C++
Published by rasolca almost 2 years ago

dla-future - DLA-Future 0.3.1

Bugfix:

  • Fixed compilation with gcc 9.3
  • Fixed compilation with CUDA 11.2
  • Improved eigensolver tests

- C++
Published by rasolca about 2 years ago

dla-future - DLA-Future 0.3.0

Changes:

  • added C and ScaLAPACK API (generalized eigensolver) (#992)
  • removed pika-algorithm dependency (#945)

Performance improvements:

  • Fixed Cholesky priorities (#999)

- C++
Published by rasolca about 2 years ago

dla-future - DLA-Future 0.2.1

Bugfix:

  • Fixed a problem in reduction_to_band that could have produced results filled with NaNs for certain corner cases. (E.g. input matrix with all off-band elements set to 0).

- C++
Published by rasolca over 2 years ago

dla-future - DLA-Future 0.2.0

Changes:

  • renamed algorithms using snake case (#942)
  • added C and ScaLAPACK API (cholesky and eigensolver) (#886)
  • Matrix API:
    • initial support for matrices with different tile/block-size (#909)
    • initial support for matrix subpipelines (#898)
    • initial support for submatrices (#934)
    • initial support for matrix redistribution (#933) ## Bugfixes:
  • fixed a problem in tridiagonal_eigensolver which produced wrong results for some classes of matrices (#960) ## Performance improvements:
  • introduced busy barriers in reduction_to_band (#864)
  • new band_to_tridiagonal algorithm implementation (#938, #946)
  • improved the rank1 problem solution in tridiagonal_eigensolver (#904, #936)

- C++
Published by rasolca over 2 years ago

dla-future - DLA-Future 0.1.0

The first release of DLA-Future.

- C++
Published by rasolca over 2 years ago