Releases | Open Source Science

Feature ‘AMD application optimization layer’ that uplifts the performance of HPC and scientific applications
Feature ‘Fast MPI transpose algorithm’ to speed up the distributed MPI FFT computations
Feature ‘Top N planner’ that minimizes single-threaded run-to-run variations
Support for building AMD FFTW library on Windows
GCC compilation support for AMD processors based on the AMD “Zen3” core architecture

- C
Published by BiplabRaut over 4 years ago

amd-fftw - AMD Optimized FFTW version 3.0.1

AMD Optimized FFTW version 3.0.1

Highlights of improvements on AMD EPYC^TM processor family CPUs - A new planner feature called Top N planner is introduced that minimizes single-threaded run-to-run variations. - New parallel MPI transpose algorithm enabled via configure option "--enable-amd-mpi-vader-limit" - When using this configure option, the user needs to set --mca btlvadereager_limit appropriately (current preference is 65536) in the MPIRUN command.

- C
Published by pradeeptrgit almost 5 years ago

amd-fftw - AMD Optimized FFTW version 3.0

AMD Optimized FFTW version 3.0

Highlights of improvements on AMD EPYC^TM processor family CPUs

New fast planner that improves the time of various planning modes in general and OPATIENT mode in particular. It can be enabled through configure option “–enable-amd-fast-planner”
Support for configure option “AMD_ARCH” to help cross compilation. It can take various options like auto/znver1/znver2/znver3 for AMD EPYC processors
Quad precision support is now included for AOCC clang compiler from version 10 onwards
Improved handling of –enable-debug and “CC” options by ‘configure’ when –enable-amd-opt is used
Fixed the wrong behavior of OWISDOM feature in the absence of wisdom file

- C
Published by pradeeptrgit about 5 years ago

amd-fftw - AMD Optimized FFTW version 2.2

AMD Optimized FFTW version 2.2

Highlights of improvements on AMD EPYC^TM processor family CPUs

Improved performance of in-place MPI FFT by employing a faster in-place MPI transpose routine.
Improved performance of copy function cpy2d_pair used for rank-0 transform and buffering plans.
Added DFT kernels of higher radix sizes for q1fv, t1fv and q1fv FFT codelets.

- C
Published by pradeeptrgit almost 6 years ago

amd-fftw - AMD Optimized FFTW Version 2.1

AMD Optimized FFTW version 2.1

Highlights of improvements on AMD EPYC^TM processor family CPUs

Improved performance of the FFT kernels for AVX and AVX2
Improved performance of copy function used in rank-0 transform and buffering plans.
Several build configuration updates that work with --enable-amd-opt option including long double and quad precision support, CFLAGS, AOCC/clang compiler support

- C
Published by pradeeptrgit over 6 years ago

amd-fftw - AMD FFTW 2.0

AMD FFTW 2.0 - AMD Optimizations are enabled through configure option "--enable-amd-opt" - Improved performance of cpy2d routine for in-place transform of FFTW - Enabled 256-bit SIMD kernels selection over 128-bit SIMD kernels for AMD CPU when the processor has 256-bit FPU and SIMD support - New improved in-place transpose method targeted for very large sized FFT. This is an optional feature for single core execution that can be enabled by configure option "--enable-amd-trans" - FFTW wisdom file feature (reading and writing) is extended to support multiple wisdom files corresponding to different FFT problems. This avoids overwriting of same wisdom file for different FFT problems

- C
Published by pradeeptrgit almost 7 years ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

Recent Releases of amd-fftw

amd-fftw - AOCL-FFTW 5.1

amd-fftw - AOCL-FFTW 5.0

amd-fftw - AOCL-FFTW 4.2

amd-fftw - AOCL-FFTW 4.1

amd-fftw - AOCL FFTW version 4.0

amd-fftw - AOCL FFTW version 3.2

amd-fftw - AMD Optimized FFTW version 3.1

amd-fftw - AMD Optimized FFTW version 3.0.1

amd-fftw - AMD Optimized FFTW version 3.0

amd-fftw - AMD Optimized FFTW version 2.2

amd-fftw - AMD Optimized FFTW Version 2.1

amd-fftw - AMD FFTW 2.0