Recent Releases of elpa

elpa - ELPA 2024.05.001 release

- Fortran
Published by marekandreas over 1 year ago

elpa - ELPA 2023.11.001 release

enable gpu-streams per default for NVIDIA and AMD GPUs
Updated / improved documentation and man pages
Fixed compilation error on AMD GPUs
Fixed SVE 256 compute kernels
Allow (currently in parts of ELPA) to use NVIDIA NCCL for device to device commpunication
Speed up of GPU version of hermitian_multiply by up to an factor of 4
significantly faster full-to-tridiagonal step in ELPA 1stage GPU
significatnly faster ELPA 2stage solver on Intel GPUs
Consistent enabling/disabling of SKEW_SYMMETRIC in header files
new setup_gpu API function

- Fortran
Published by marekandreas about 2 years ago

elpa - ELPA 2023.05.001

added CITATION.cff file
allow test programs to be run with 1 MPI task
correct a memory leak in the gpu stream setup
better handling of GPU BLAS handles
implement the execution of the AMD HIP code path on NVIDIA GPUs
implement the execution of the SYCL GPU code path on CPUs (debugging)
port generalized routines to SYCL GPU
PoC to use NVIDIA NCCL instead of MPI (not production ready)
somewhat cleanup of documentation

- Fortran
Published by marekandreas over 2 years ago

elpa - ELPA 2023.05.001.rc1

added CITATION.cff file
allow test programs to be run with 1 MPI task
correct a memory leak in the gpu stream setup
better handling of GPU BLAS handles
implement the execution of the AMD HIP code path on NVIDIA GPUs
implement the execution of the SYCL GPU code path on CPUs (debugging)
port generalized routines to SYCL GPU
PoC to use NVIDIA NCCL instead of MPI (not production ready)
somewhat cleanup of documentation

- Fortran
Published by marekandreas almost 3 years ago

elpa - ELPA 2022.11.001

- Fortran
Published by marekandreas about 3 years ago

elpa - ELPA_2015.11_release

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2016.05_release

fix problem with generated *.sh- check scripts
name library differently if build without MPI support
install only public modules
support building without MPI for one node usage
doxygen and man pages documentation for ELPA
cleanup of documentation
introduction of SSE gcc intrinsic kernels
Remove errors due to unaligned memory
removal of Fortran "contains functions"
Fortran interfaces for assembly and C kernel

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2016.05.003_release

fix a problem with the build of SSE kernels
make some (internal) functions public, such that they can be used outside of ELPA
add documentation and interfaces for new public functions
shorten file namses and directory names for test programs in under to by pass "make agrument list too long" error

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2016.05.004_release

fix a problem with the private state of module precision
distribute test_project with dist tarball
generic driver routine for ELPA 1stage and 2stage
test case for elpamultatbreal
test case for elpamultahbcomplex
test case for elpacholeskyreal
test case for elpacholeskycomplex
test case for elpainverttrm_real
test case for elpainverttrm_complex
fix building of static library
better choice of AVX, AVX2, AVX512 kernels
make assumed size Fortran arrays default

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2017.05.001_release

faster GPU implementation, especially for ELPA 1stage
the restriction of the block-cyclic distribution blocksize = 128 in the GPU case is relaxed
Faster CPU implementation due to better blocking
support of already banded matrices (new API only!)
improved KNL support
add missing script "manual_cpp"
cleanup of code

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2017.05.002_release

Mainly bugfixes for ELPA 2017.05.001: - fix memory leak of MPI communicators - tests for hermitian_multiply, cholesky decomposition and - deal with a problem on Debian (mawk)

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2017.05.003_release

remove bug in invert_triangular, which had been introduced in ELPA 2017.05.002

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2017.11.001_release

significant improvement of performance of GPU version
added new compute kernels for IBM Power8 and Fujistu Sparc64 processors
a first implementation of autotuning capability
correct some type statements in Fortran
correct detection of PAPI in configure step

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2018.05.001_release

significant improved performance on K-computer
added interface for the generalized eigenvalue problem
extended autotuning functionality

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2018.11.001_release

improved autotuning
improved performance of generalized problem via Cannon's algorithm
check pointing functionality of elpa objects
store/read/resume of autotuning
Python interface for ELPA
more ELPA functions have an optional error argument (Fortran) or required error argument (C) => ABI and API change

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2019.05.002_release

repacking of the src since the legacy interface has been forgotten in the 2019.05.001 release
elpaprintkernels supports GPU usage
fix an error if PAPI measurements are activated
new simple real kernels: block4 and block6
c functions can be build with optional arguments if compiler supports it (configure option)
allow measurements with the likwid tool
users can define the default-kernel at build time
ELPA versioning number is provided in the C header files
as announced a year ago, the following deprecated routines have been finally removed; see DEPRECATEDFEATURES for the replacement routines , which have been introduced a year ago. Removed routines: -> multatbreal -> multahbcomplex -> inverttrmreal -> inverttrmcomplex -> choleskyreal -> choleskycomplex -> solvetridi
new kernels for ARM arch64 added
fix an out-of-bound-error in elpa2

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2019.11.001_release

solve a bug when using parallel make builds
check the cpuid set during build time
add experimental feature "heterogenous-cluster-support"
add experimental feature for 64bit integer LAS/LAPACK/SCALAPACK support
add experimental feature for 64bit integer MPI support
support of ELPA for real valued skew-symmetric matrices, please cite: https://arxiv.org/abs/1912.04062
cleanup of the GPU version
bugfix in the OpenMP version
bugfix on the Power8/9 kernels
bugfix on ARM aarch64 FMA kernels

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2021.05.002_release

no feature changes
correct the SO version which was wrong in ELPA 2021.05.001
allow the user to set the mapping of MPI tasks to GPU id per set/get
experimental feature: port to AMD GPUS, works correctly, performance yet unclear; only tested --with-mpi=0
On request, ELPA can print the pinning of MPI tasks and OpenMP thread
support for FUGAKU: some minor fix still have to be fixed due to compiler issues
BUG FIX: if matrix is already banded, check whether bandwidth >= 2. DO NOT ALLOW a bandwidth = 1, since this would imply that the input matrix is already diagonal which the ELPA algorithms do not support
BUG FIX in internal test programs: do not consider a residual of 0.0 to be an error
support for skew-symmetric matrices now enabled by default
BUG FIX in generalized case: in setups like "mpiexec -np 4 ./validaterealdoublegeneralized1stage_random 90 90 45`
ELPASETUPS does now (in case of MPI-runs) check whether the user-provided BLACSGRID is reasonable (i.e. ELPA does _not rely anymore that the user does check prior to calling ELPA whether the BLACSGRID is ok) if this check fails then ELPA returns with an error
limit number of OpenMP threads to one, if MPI thread level is not at least MPITHREADSERIALIZED
allow checking of the supported threading level of the MPI library at build time

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2020.05.001_release

Enable compilation with gcc v10
Fix a bug in elpamultiplya_b (GPU)
improved documentation, including fixing of typos and errors in markdown
Fix a bug in the calling of Cannons algorithm which might lead to crashes for a squared process grid
improvements and bugfixes of the ELPA2 stage GPU version, see https://arxiv.org/abs/2002.10991
bugfix for the build of AVX-512 KNL kernels
clean seperation of SIMD instructions for AVX and AVX2 kernels
better error checking for allocations / deallocations of CPU and GPU memory
experimental feature of matrix redistribution
bugfix in the cpuid tests
bugfix in elpa2printkernels
bugfix when configuring --with-gpu-support-only

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2022.05.001_release

implement OpenMP offloading to GPU for Intel GPU for ELPA 1 and 2 stage ( except for "step triditoband")
implement SYCL offloading to Intel GPUs for ELPA 1 and 2 stage
AMD GPU offload has been tested on Mi200 (also with MPI)
can use ELPA with one individual "gpu stream" per MPI task (Nvidia and AMD only)
allow steps "cholesky", "inverttrm", and "multiplyab" to be called directly with GPU device pointers
on error ELPA returns rather than aborting to give controll to calling application and to allow for error recovery and/or graceful abort
allow ELPA to build with OpenMP and GPU
fix an FPE with the Intel compiler and AVX-512 instructions and optimization level > -O2
better checking of user defined options in configure

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2021.11.002_release

fix an error when choosing the Nvidia GPU kernel (fallback to CPU might have been selected)
support of Nvidia cusolver library to accelerate some routines (needs CUDA >= 11.4)
experimental Nvidia GPU versions for "elpainverttrm" and "elpacholesky" can be tested by setting elpaset("gpuinverttrm",1) and elpaset("gpucholesky",1). Is not used otherwise
BUGFIX: error in resort_ev (also backported to 2021.05.002 and 2020.11.001)
allow to call ELPA eigenvectors and eigenvalues also with GPU device pointers for the input matrix, the vectors of eigenvalues and the output matrix for the eigenvectors
BUGFIX: error in resort_ev
EXPERIMENTAL feature:g new real GPU kernel for Nvidia A100 (provided by Nvidia): can show a performance boost if number of vectors per MPI task is > 20000. Most likely most benifit in non-MPI version
as anounced, droping the legacy interface
more autotuning features, for example using non blocking MPI collectives
new version of autotunig avoiding a combinatorial grow of possibilities (the old autotune version can be still used if elpa%autotunesetapiversion(APIVERSION, error) is set to API_VERSION < 20211125)

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2020.11.001_release

this release containts mostly bugfixes:
- fix determination whether a _ is needed to link Fortran to C
- fix an error in the real block4 kernel for arch64 NEON
- add missing testscalapacktemplate.F90 to EXTRA_DIST list
- fix error in the GPU kernel
- do not use MPICOMMWORLD but mpi_parent instead
switch form python2 to python3
experimental feature: complex kernels for arch64 NEON
experimental feature: kernels for ARM SVE

- Fortran
Published by marekandreas about 5 years ago