Recent Releases of dla-future
dla-future - DLA-Future 0.10.0
Changes
- Added inverse of a triangular matrix. (#1324)
- Added inverse from Cholesky factor and its C API. (#1324 and #1326)
- Improved
Paneltransposed broadcast to be more flexible. (#1325)- Note: previous usages without changes provide correct result but they perform an additional tile broadcast. This extra communication can be avoided by shrinking the transposed panel.
- Refactored
MatrixAPI. (#1321)- Constructors changed.
- Member functions have been updated. Old member functions are still available, but will be deprecated soon.
Bug fixes
- Fixed behaviour when CUDA archs is not specified for a CUDA build. (#1318)
- C++
Published by rasolca 8 months ago
dla-future - DLA-Future 0.9.0
Changes
- Automatically free all DLA-Future grids when finalizing the C API. (#1308)
- Tau factors are stored on GPU memory. (#1304)
Bug fixes
- Avoid potential stack overflows by transferring work to a new task on the default thread pool before launching GPU work. (#1292)
- C++
Published by RMeli 9 months ago
dla-future - DLA-Future 0.8.0
Changes
- Renamed tune parameters. (#1270)
dlaf:red2band-panel-nworkersbecomesdlaf:red2band-panel-num-threads.dlaf:tridiag-rank1-nworkersbecomesdlaf:tridiag-rank1-num-threads.
Performance improvements
- Many improvements to
reduction_to_bandand its backtransformation. (#1214, #1256, #1263)
Bug fixes
- Fixed a bug in correctness checking of eigensolver miniapps when computing partial eigenspectrum. (#1283)
- C++
Published by rasolca 10 months ago
dla-future - DLA-Future 0.7.3
Bug fixes
- Changed C ScaLAPACK API indexing convention to 1-based for partial eigenspectrum. (#1248)
- C++
Published by RMeli about 1 year ago
dla-future - DLA-Future 0.7.1
Bug fixes
- Fixed a compilation issue with ROCm. (#1241)
- Fixed missing includes of
<complex>. (#1243)
- C++
Published by msimberg about 1 year ago
dla-future - DLA-Future 0.7.0
Changes
- Added (generalized) eigensolver which computes only a part of the eigenspectrum. (#1194)
Normis now fully asynchronous. (#1221)
Performance improvements
- Refactored communication to use pika's
transform_mpiand polling support. (#1125) - Use custom coalescing heuristic for memory pools. (#1183)
- Added configuration option for number of CUDA streams and cuBLAS/SOLVER handles. (#1222, #1182)
- Some algorithmic clean-ups and improvements. (#1213, #1219, #1232)
Bug fixes
- Fixed builds with CUDA and stdexec enabled. (#1188)
- Work around buggy HIP complex operator overloads. (#1195)
- Namespace (SCA)LAPACK CMake modules with DLAF to avoid conflicts with other packages. (#1178)
- C++
Published by rasolca about 1 year ago
dla-future - DLA-Future 0.6.0
Changes
- Renamed ScaLAPACK-like generalized eigensolvers
pXsygvx/pXhegvxtopXsygvd/pXhegvd. (#1168) - Introduced generalized eigensolver where the matrix B is already factorized. (#1167)
Performance improvements
- Local eigenvector permutations in the distributed tridiagonal eigensolver are executed directly in GPU memory. (#1118)
Bug fixes
- Fixed ScaLAPACK detection in CMake for specific uenv cases. (#1159)
- C++
Published by rasolca over 1 year ago
dla-future - DLA-Future 0.5.0
Changes
- Introduced an option (*) for forcing contiguous GPU communication buffers. (#1096)
- Introduced an option (*) for enabling GPU aware MPI communication. (#1102)
- Removed special handling of Intel MKL, as it could lead to broken installations. (#1149)
- Spack installations: spack will set the correct variables.
- Manual installations: the user is responsible to correctly set variables (see BUILD.md).
(*) These options are available as spack variants.
Performance improvements
- Don't communicate in algorithms when using single rank communicators. (#1097)
- Fixed slow performance of local version of
bt_band_to_tridiagonal(#1144)
Bug fixes
- Implemented a workaround for
hipMemcpyDefault2D memcpys, due to bugs in HIP. (#1106) - Miniapps initialize HIP before MPI, as on older Cray MPICH versions initializing HIP after MPI leads to HIP not seeing any devices. (#1090)
- C++
Published by msimberg over 1 year ago
dla-future - DLA-Future 0.4.1
Bug fixes
- Update project version and export it in CMake. (#1121)
- C++
Published by msimberg over 1 year ago
dla-future - DLA-Future 0.4.0
Changes:
- Modified
CommunicatorGridto avoid blocking calls toMPI_Comm_dup. It now returns communicator pipelines. (#993) - Added support for Intel oneMKL and the
intel-oneapi-mklspack package. (#1073) (*)
Performance improvements:
- Reduced the size of the matrix-matrix multiplications in the tridiagonal eigensolver to cover only the non deflated part of the eigenvectors. (#951 #967 #996 #997 #998)
- Introduced stackless threads where appropriate. (#1037)
Bug fixes:
- Use
drop_operation_stateto avoid stack overflows. (#1004)
Notes:
(*) At the time of the release the spack spec blaspp~openmp ^intel-oneapi-mkl threads=openmp doesn't build. If you rely on multithreaded BLAS we suggest to use blaspp+openmp ^intel-oneapi-mkl threads=openmp until https://github.com/spack/spack/pull/42087 gets merged.
- C++
Published by rasolca almost 2 years ago
dla-future - DLA-Future 0.3.1
Bugfix:
- Fixed compilation with gcc 9.3
- Fixed compilation with CUDA 11.2
- Improved eigensolver tests
- C++
Published by rasolca about 2 years ago
dla-future - DLA-Future 0.3.0
Changes:
- added C and ScaLAPACK API (generalized eigensolver) (#992)
- removed pika-algorithm dependency (#945)
Performance improvements:
- Fixed Cholesky priorities (#999)
- C++
Published by rasolca about 2 years ago
dla-future - DLA-Future 0.2.1
Bugfix:
- Fixed a problem in
reduction_to_bandthat could have produced results filled with NaNs for certain corner cases. (E.g. input matrix with all off-band elements set to 0).
- C++
Published by rasolca over 2 years ago
dla-future - DLA-Future 0.2.0
Changes:
- renamed algorithms using snake case (#942)
- added C and ScaLAPACK API (cholesky and eigensolver) (#886)
MatrixAPI:- initial support for matrices with different tile/block-size (#909)
- initial support for matrix subpipelines (#898)
- initial support for submatrices (#934)
- initial support for matrix redistribution (#933) ## Bugfixes:
- fixed a problem in
tridiagonal_eigensolverwhich produced wrong results for some classes of matrices (#960) ## Performance improvements: - introduced busy barriers in
reduction_to_band(#864) - new
band_to_tridiagonalalgorithm implementation (#938, #946) - improved the rank1 problem solution in
tridiagonal_eigensolver(#904, #936)
- C++
Published by rasolca over 2 years ago
dla-future - DLA-Future 0.1.0
The first release of DLA-Future.
- C++
Published by rasolca over 2 years ago