Recent Releases of amd-fftw
amd-fftw - AOCL-FFTW 5.0
Highlights of this release
- Support added for using the wisdom feature by default under the –enable-amd-app-opt option
- Minor bug fixes
- C
Published by BiplabRaut over 1 year ago
amd-fftw - AOCL-FFTW 4.1
Highlights of this release
- Dynamic dispatch support added for AOCC build of the library on Linux
- Minor bug fixes
- C
Published by BiplabRaut almost 3 years ago
amd-fftw - AOCL FFTW version 4.0
Highlights of improvements on AMD EPYCTM processor family CPUs
- AVX-512 enablement of DFT kernels
- AVX-512 optimization of copy and transpose routines
- C
Published by BiplabRaut over 3 years ago
amd-fftw - AOCL FFTW version 3.2
Highlights of improvements on AMD EPYCTM processor family CPUs
- Dynamic dispatcher for AOCL-FFTW
- Upgraded AOCL-FFTW to align with the reference FFTW 3.3.10 from MIT
- Windows FFTW features aligned with Linux FFTW
- C
Published by BiplabRaut almost 4 years ago
amd-fftw - AMD Optimized FFTW version 3.1
Highlights of improvements on AMD EPYCTM processor family CPUs
- Feature ‘AMD application optimization layer’ that uplifts the performance of HPC and scientific applications
- Feature ‘Fast MPI transpose algorithm’ to speed up the distributed MPI FFT computations
- Feature ‘Top N planner’ that minimizes single-threaded run-to-run variations
- Support for building AMD FFTW library on Windows
- GCC compilation support for AMD processors based on the AMD “Zen3” core architecture
- C
Published by BiplabRaut over 4 years ago
amd-fftw - AMD Optimized FFTW version 3.0.1
AMD Optimized FFTW version 3.0.1
Highlights of improvements on AMD EPYCTM processor family CPUs - A new planner feature called Top N planner is introduced that minimizes single-threaded run-to-run variations. - New parallel MPI transpose algorithm enabled via configure option "--enable-amd-mpi-vader-limit" - When using this configure option, the user needs to set --mca btlvadereager_limit appropriately (current preference is 65536) in the MPIRUN command.
- C
Published by pradeeptrgit almost 5 years ago
amd-fftw - AMD Optimized FFTW version 3.0
AMD Optimized FFTW version 3.0
Highlights of improvements on AMD EPYCTM processor family CPUs
- New fast planner that improves the time of various planning modes in general and OPATIENT mode in particular. It can be enabled through configure option “–enable-amd-fast-planner”
- Support for configure option “AMD_ARCH” to help cross compilation. It can take various options like auto/znver1/znver2/znver3 for AMD EPYC processors
- Quad precision support is now included for AOCC clang compiler from version 10 onwards
- Improved handling of –enable-debug and “CC” options by ‘configure’ when –enable-amd-opt is used
- Fixed the wrong behavior of OWISDOM feature in the absence of wisdom file
- C
Published by pradeeptrgit about 5 years ago
amd-fftw - AMD Optimized FFTW version 2.2
AMD Optimized FFTW version 2.2
Highlights of improvements on AMD EPYCTM processor family CPUs
- Improved performance of in-place MPI FFT by employing a faster in-place MPI transpose routine.
- Improved performance of copy function cpy2d_pair used for rank-0 transform and buffering plans.
- Added DFT kernels of higher radix sizes for q1fv, t1fv and q1fv FFT codelets.
- C
Published by pradeeptrgit almost 6 years ago
amd-fftw - AMD Optimized FFTW Version 2.1
AMD Optimized FFTW version 2.1
Highlights of improvements on AMD EPYCTM processor family CPUs
- Improved performance of the FFT kernels for AVX and AVX2
- Improved performance of copy function used in rank-0 transform and buffering plans.
- Several build configuration updates that work with --enable-amd-opt option including long double and quad precision support, CFLAGS, AOCC/clang compiler support
- C
Published by pradeeptrgit over 6 years ago
amd-fftw - AMD FFTW 2.0
AMD FFTW 2.0 - AMD Optimizations are enabled through configure option "--enable-amd-opt" - Improved performance of cpy2d routine for in-place transform of FFTW - Enabled 256-bit SIMD kernels selection over 128-bit SIMD kernels for AMD CPU when the processor has 256-bit FPU and SIMD support - New improved in-place transpose method targeted for very large sized FFT. This is an optional feature for single core execution that can be enabled by configure option "--enable-amd-trans" - FFTW wisdom file feature (reading and writing) is extended to support multiple wisdom files corresponding to different FFT problems. This avoids overwriting of same wisdom file for different FFT problems
- C
Published by pradeeptrgit almost 7 years ago