Recent Releases of amd-fftw

amd-fftw - AOCL-FFTW 5.1

  • Minor build issue fixes

- C
Published by BiplabRaut about 1 year ago

amd-fftw - AOCL-FFTW 5.0

Highlights of this release

  • Support added for using the wisdom feature by default under the –enable-amd-app-opt option
  • Minor bug fixes

- C
Published by BiplabRaut over 1 year ago

amd-fftw - AOCL-FFTW 4.2

- C
Published by BiplabRaut over 2 years ago

amd-fftw - AOCL-FFTW 4.1

Highlights of this release

  • Dynamic dispatch support added for AOCC build of the library on Linux
  • Minor bug fixes

- C
Published by BiplabRaut almost 3 years ago

amd-fftw - AOCL FFTW version 4.0

Highlights of improvements on AMD EPYCTM processor family CPUs

  • AVX-512 enablement of DFT kernels
  • AVX-512 optimization of copy and transpose routines

- C
Published by BiplabRaut over 3 years ago

amd-fftw - AOCL FFTW version 3.2

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Dynamic dispatcher for AOCL-FFTW
  • Upgraded AOCL-FFTW to align with the reference FFTW 3.3.10 from MIT
  • Windows FFTW features aligned with Linux FFTW

- C
Published by BiplabRaut almost 4 years ago

amd-fftw - AMD Optimized FFTW version 3.1

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Feature ‘AMD application optimization layer’ that uplifts the performance of HPC and scientific applications
  • Feature ‘Fast MPI transpose algorithm’ to speed up the distributed MPI FFT computations
  • Feature ‘Top N planner’ that minimizes single-threaded run-to-run variations
  • Support for building AMD FFTW library on Windows
  • GCC compilation support for AMD processors based on the AMD “Zen3” core architecture

- C
Published by BiplabRaut over 4 years ago

amd-fftw - AMD Optimized FFTW version 3.0.1

AMD Optimized FFTW version 3.0.1

Highlights of improvements on AMD EPYCTM processor family CPUs - A new planner feature called Top N planner is introduced that minimizes single-threaded run-to-run variations. - New parallel MPI transpose algorithm enabled via configure option "--enable-amd-mpi-vader-limit" - When using this configure option, the user needs to set --mca btlvadereager_limit appropriately (current preference is 65536) in the MPIRUN command.

- C
Published by pradeeptrgit almost 5 years ago

amd-fftw - AMD Optimized FFTW version 3.0

AMD Optimized FFTW version 3.0

Highlights of improvements on AMD EPYCTM processor family CPUs

  • New fast planner that improves the time of various planning modes in general and OPATIENT mode in particular. It can be enabled through configure option “–enable-amd-fast-planner”
  • Support for configure option “AMD_ARCH” to help cross compilation. It can take various options like auto/znver1/znver2/znver3 for AMD EPYC processors
  • Quad precision support is now included for AOCC clang compiler from version 10 onwards
  • Improved handling of –enable-debug and “CC” options by ‘configure’ when –enable-amd-opt is used
  • Fixed the wrong behavior of OWISDOM feature in the absence of wisdom file

- C
Published by pradeeptrgit about 5 years ago

amd-fftw - AMD Optimized FFTW version 2.2

AMD Optimized FFTW version 2.2

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Improved performance of in-place MPI FFT by employing a faster in-place MPI transpose routine.
  • Improved performance of copy function cpy2d_pair used for rank-0 transform and buffering plans.
  • Added DFT kernels of higher radix sizes for q1fv, t1fv and q1fv FFT codelets.

- C
Published by pradeeptrgit almost 6 years ago

amd-fftw - AMD Optimized FFTW Version 2.1

AMD Optimized FFTW version 2.1

Highlights of improvements on AMD EPYCTM processor family CPUs

  • Improved performance of the FFT kernels for AVX and AVX2
  • Improved performance of copy function used in rank-0 transform and buffering plans.
  • Several build configuration updates that work with --enable-amd-opt option including long double and quad precision support, CFLAGS, AOCC/clang compiler support

- C
Published by pradeeptrgit over 6 years ago

amd-fftw - AMD FFTW 2.0

AMD FFTW 2.0 - AMD Optimizations are enabled through configure option "--enable-amd-opt" - Improved performance of cpy2d routine for in-place transform of FFTW - Enabled 256-bit SIMD kernels selection over 128-bit SIMD kernels for AMD CPU when the processor has 256-bit FPU and SIMD support - New improved in-place transpose method targeted for very large sized FFT. This is an optional feature for single core execution that can be enabled by configure option "--enable-amd-trans" - FFTW wisdom file feature (reading and writing) is extended to support multiple wisdom files corresponding to different FFT problems. This avoids overwriting of same wisdom file for different FFT problems

- C
Published by pradeeptrgit almost 7 years ago