Recent Releases of elpa

elpa - ELPA 2024.05.001 release

- Fortran
Published by marekandreas over 1 year ago

elpa - ELPA 2023.11.001 release

  • enable gpu-streams per default for NVIDIA and AMD GPUs
  • Updated / improved documentation and man pages
  • Fixed compilation error on AMD GPUs
  • Fixed SVE 256 compute kernels
  • Allow (currently in parts of ELPA) to use NVIDIA NCCL for device to device commpunication
  • Speed up of GPU version of hermitian_multiply by up to an factor of 4
  • significantly faster full-to-tridiagonal step in ELPA 1stage GPU
  • significatnly faster ELPA 2stage solver on Intel GPUs
  • Consistent enabling/disabling of SKEW_SYMMETRIC in header files
  • new setup_gpu API function

- Fortran
Published by marekandreas about 2 years ago

elpa - ELPA 2023.05.001

  • added CITATION.cff file
  • allow test programs to be run with 1 MPI task
  • correct a memory leak in the gpu stream setup
  • better handling of GPU BLAS handles
  • implement the execution of the AMD HIP code path on NVIDIA GPUs
  • implement the execution of the SYCL GPU code path on CPUs (debugging)
  • port generalized routines to SYCL GPU
  • PoC to use NVIDIA NCCL instead of MPI (not production ready)
  • somewhat cleanup of documentation

- Fortran
Published by marekandreas over 2 years ago

elpa - ELPA 2023.05.001.rc1

  • added CITATION.cff file
  • allow test programs to be run with 1 MPI task
  • correct a memory leak in the gpu stream setup
  • better handling of GPU BLAS handles
  • implement the execution of the AMD HIP code path on NVIDIA GPUs
  • implement the execution of the SYCL GPU code path on CPUs (debugging)
  • port generalized routines to SYCL GPU
  • PoC to use NVIDIA NCCL instead of MPI (not production ready)
  • somewhat cleanup of documentation

- Fortran
Published by marekandreas almost 3 years ago

elpa - ELPA 2022.11.001

- Fortran
Published by marekandreas about 3 years ago

elpa - ELPA_2015.11_release

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2016.05_release

  • fix problem with generated *.sh- check scripts
  • name library differently if build without MPI support
  • install only public modules
  • support building without MPI for one node usage
  • doxygen and man pages documentation for ELPA
  • cleanup of documentation
  • introduction of SSE gcc intrinsic kernels
  • Remove errors due to unaligned memory
  • removal of Fortran "contains functions"
  • Fortran interfaces for assembly and C kernel

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2016.05.003_release

  • fix a problem with the build of SSE kernels
  • make some (internal) functions public, such that they can be used outside of ELPA
  • add documentation and interfaces for new public functions
  • shorten file namses and directory names for test programs in under to by pass "make agrument list too long" error

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2016.05.004_release

  • fix a problem with the private state of module precision
  • distribute test_project with dist tarball
  • generic driver routine for ELPA 1stage and 2stage
  • test case for elpamultatbreal
  • test case for elpamultahbcomplex
  • test case for elpacholeskyreal
  • test case for elpacholeskycomplex
  • test case for elpainverttrm_real
  • test case for elpainverttrm_complex
  • fix building of static library
  • better choice of AVX, AVX2, AVX512 kernels
  • make assumed size Fortran arrays default

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2017.05.001_release

  • faster GPU implementation, especially for ELPA 1stage
  • the restriction of the block-cyclic distribution blocksize = 128 in the GPU case is relaxed
  • Faster CPU implementation due to better blocking
  • support of already banded matrices (new API only!)
  • improved KNL support
  • add missing script "manual_cpp"
  • cleanup of code

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2017.05.002_release

Mainly bugfixes for ELPA 2017.05.001: - fix memory leak of MPI communicators - tests for hermitian_multiply, cholesky decomposition and - deal with a problem on Debian (mawk)

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2017.05.003_release

  • remove bug in invert_triangular, which had been introduced in ELPA 2017.05.002

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2017.11.001_release

  • significant improvement of performance of GPU version
  • added new compute kernels for IBM Power8 and Fujistu Sparc64 processors
  • a first implementation of autotuning capability
  • correct some type statements in Fortran
  • correct detection of PAPI in configure step

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2018.05.001_release

  • significant improved performance on K-computer
  • added interface for the generalized eigenvalue problem
  • extended autotuning functionality

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2018.11.001_release

  • improved autotuning
  • improved performance of generalized problem via Cannon's algorithm
  • check pointing functionality of elpa objects
  • store/read/resume of autotuning
  • Python interface for ELPA
  • more ELPA functions have an optional error argument (Fortran) or required error argument (C) => ABI and API change

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2019.05.002_release

  • repacking of the src since the legacy interface has been forgotten in the 2019.05.001 release
  • elpaprintkernels supports GPU usage
  • fix an error if PAPI measurements are activated
  • new simple real kernels: block4 and block6
  • c functions can be build with optional arguments if compiler supports it (configure option)
  • allow measurements with the likwid tool
  • users can define the default-kernel at build time
  • ELPA versioning number is provided in the C header files
  • as announced a year ago, the following deprecated routines have been finally removed; see DEPRECATEDFEATURES for the replacement routines , which have been introduced a year ago. Removed routines: -> multatbreal -> multahbcomplex -> inverttrmreal -> inverttrmcomplex -> choleskyreal -> choleskycomplex -> solvetridi
  • new kernels for ARM arch64 added
  • fix an out-of-bound-error in elpa2

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2019.11.001_release

  • solve a bug when using parallel make builds
  • check the cpuid set during build time
  • add experimental feature "heterogenous-cluster-support"
  • add experimental feature for 64bit integer LAS/LAPACK/SCALAPACK support
  • add experimental feature for 64bit integer MPI support
  • support of ELPA for real valued skew-symmetric matrices, please cite: https://arxiv.org/abs/1912.04062
  • cleanup of the GPU version
  • bugfix in the OpenMP version
  • bugfix on the Power8/9 kernels
  • bugfix on ARM aarch64 FMA kernels

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2021.05.002_release

  • no feature changes
  • correct the SO version which was wrong in ELPA 2021.05.001
  • allow the user to set the mapping of MPI tasks to GPU id per set/get
  • experimental feature: port to AMD GPUS, works correctly, performance yet unclear; only tested --with-mpi=0
  • On request, ELPA can print the pinning of MPI tasks and OpenMP thread
  • support for FUGAKU: some minor fix still have to be fixed due to compiler issues
  • BUG FIX: if matrix is already banded, check whether bandwidth >= 2. DO NOT ALLOW a bandwidth = 1, since this would imply that the input matrix is already diagonal which the ELPA algorithms do not support
  • BUG FIX in internal test programs: do not consider a residual of 0.0 to be an error
  • support for skew-symmetric matrices now enabled by default
  • BUG FIX in generalized case: in setups like "mpiexec -np 4 ./validaterealdoublegeneralized1stage_random 90 90 45`
  • ELPASETUPS does now (in case of MPI-runs) check whether the user-provided BLACSGRID is reasonable (i.e. ELPA does _not rely anymore that the user does check prior to calling ELPA whether the BLACSGRID is ok) if this check fails then ELPA returns with an error
  • limit number of OpenMP threads to one, if MPI thread level is not at least MPITHREADSERIALIZED
  • allow checking of the supported threading level of the MPI library at build time

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2020.05.001_release

  • Enable compilation with gcc v10
  • Fix a bug in elpamultiplya_b (GPU)
  • improved documentation, including fixing of typos and errors in markdown
  • Fix a bug in the calling of Cannons algorithm which might lead to crashes for a squared process grid
  • improvements and bugfixes of the ELPA2 stage GPU version, see https://arxiv.org/abs/2002.10991
  • bugfix for the build of AVX-512 KNL kernels
  • clean seperation of SIMD instructions for AVX and AVX2 kernels
  • better error checking for allocations / deallocations of CPU and GPU memory
  • experimental feature of matrix redistribution
  • bugfix in the cpuid tests
  • bugfix in elpa2printkernels
  • bugfix when configuring --with-gpu-support-only

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2022.05.001_release

  • implement OpenMP offloading to GPU for Intel GPU for ELPA 1 and 2 stage ( except for "step triditoband")
  • implement SYCL offloading to Intel GPUs for ELPA 1 and 2 stage
  • AMD GPU offload has been tested on Mi200 (also with MPI)
  • can use ELPA with one individual "gpu stream" per MPI task (Nvidia and AMD only)
  • allow steps "cholesky", "inverttrm", and "multiplyab" to be called directly with GPU device pointers
  • on error ELPA returns rather than aborting to give controll to calling application and to allow for error recovery and/or graceful abort
  • allow ELPA to build with OpenMP and GPU
  • fix an FPE with the Intel compiler and AVX-512 instructions and optimization level > -O2
  • better checking of user defined options in configure

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2021.11.002_release

  • fix an error when choosing the Nvidia GPU kernel (fallback to CPU might have been selected)
  • support of Nvidia cusolver library to accelerate some routines (needs CUDA >= 11.4)
  • experimental Nvidia GPU versions for "elpainverttrm" and "elpacholesky" can be tested by setting elpaset("gpuinverttrm",1) and elpaset("gpucholesky",1). Is not used otherwise
  • BUGFIX: error in resort_ev (also backported to 2021.05.002 and 2020.11.001)
  • allow to call ELPA eigenvectors and eigenvalues also with GPU device pointers for the input matrix, the vectors of eigenvalues and the output matrix for the eigenvectors
  • BUGFIX: error in resort_ev
  • EXPERIMENTAL feature:g new real GPU kernel for Nvidia A100 (provided by Nvidia): can show a performance boost if number of vectors per MPI task is > 20000. Most likely most benifit in non-MPI version
  • as anounced, droping the legacy interface
  • more autotuning features, for example using non blocking MPI collectives
  • new version of autotunig avoiding a combinatorial grow of possibilities (the old autotune version can be still used if elpa%autotunesetapiversion(APIVERSION, error) is set to API_VERSION < 20211125)

- Fortran
Published by marekandreas over 3 years ago

elpa - ELPA_2020.11.001_release

  • this release containts mostly bugfixes:
    • fix determination whether a _ is needed to link Fortran to C
    • fix an error in the real block4 kernel for arch64 NEON
    • add missing testscalapacktemplate.F90 to EXTRA_DIST list
    • fix error in the GPU kernel
    • do not use MPICOMMWORLD but mpi_parent instead
  • switch form python2 to python3
  • experimental feature: complex kernels for arch64 NEON
  • experimental feature: kernels for ARM SVE

- Fortran
Published by marekandreas about 5 years ago