fluidx3d

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.

https://github.com/projectphysx/fluidx3d

Keywords

benchmark cfd computational-fluid-dynamics fluid-dynamics fluid-simulation fluid-solver gpgpu gpu gpu-computing high-performance-computing hpc interactive-visualization lattice-boltzmann lbm opencl physics raytracing scientific-computing scientific-visualization simulation

Last synced: 6 months ago · JSON representation ·

Repository

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.

Basic Info

Host: GitHub
Owner: ProjectPhysX
License: other
Language: C++
Default Branch: master
Homepage: https://youtube.com/@ProjectPhysX
Size: 20.9 MB

Statistics

Stars: 4,554
Watchers: 63
Forks: 401
Open Issues: 32
Releases: 30

Topics

benchmark cfd computational-fluid-dynamics fluid-dynamics fluid-simulation fluid-solver gpgpu gpu gpu-computing high-performance-computing hpc interactive-visualization lattice-boltzmann lbm opencl physics raytracing scientific-computing scientific-visualization simulation

Created over 3 years ago · Last pushed 7 months ago

Metadata Files

Readme License Citation

README.md

FluidX3D

The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.

(click on images to show videos on YouTube)

Update History

- [v1.0](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v1.0) (04.08.2022) [changes](https://github.com/ProjectPhysX/FluidX3D/commit/768073501af725e392a4b85885009e2fa6400e48) (public release) - public release - [v1.1](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v1.1) (29.09.2022) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v1.0...v1.1) (GPU voxelization) - added solid voxelization on GPU (slow algorithm) - added tool to print current camera position (key G) - minor bug fix (workaround for Intel iGPU driver bug with triangle rendering) - [v1.2](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v1.2) (24.10.2022) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v1.1...v1.2) (force/torque compuatation) - added functions to compute force/torque on objects - added function to translate Mesh - added Stokes drag validation setup - [v1.3](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v1.3) (10.11.2022) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v1.2...v1.3) (minor bug fixes) - added unit conversion functions for torque - `FORCE_FIELD` and `VOLUME_FORCE` can now be used independently - minor bug fix (workaround for AMD legacy driver bug with binary number literals) - [v1.4](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v1.4) (14.12.2022) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v1.3...v1.4) (Linux graphics) - complete rewrite of C++ graphics library to minimize API dependencies - added interactive graphics mode on Linux with X11 - fixed streamline visualization bug in 2D - [v2.0](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.0) (09.01.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v1.4...v2.0) (multi-GPU upgrade) - added (cross-vendor) multi-GPU support on a single node (PC/laptop/server) - [v2.1](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.1) (15.01.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.0...v2.1) (fast voxelization) - made solid voxelization on GPU lightning fast (new algorithm, from minutes to milliseconds) - [v2.2](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.0) (20.01.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.1...v2.2) (velocity voxelization) - added option to voxelize moving/rotating geometry on GPU, with automatic velocity initialization for each grid point based on center of rotation, linear velocity and rotational velocity - cells that are converted from solid->fluid during re-voxelization now have their DDFs properly initialized - added option to not auto-scale mesh during `read_stl(...)`, with negative `size` parameter - added kernel for solid boundary rendering with marching-cubes - [v2.3](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.3) (30.01.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.2...v2.3) (particles) - added particles with immersed-boundary method (either passive or 2-way-coupled, only supported with single-GPU) - minor optimization to GPU voxelization algorithm (workgroup threads outside mesh bounding-box return after ray-mesh intersections have been found) - displayed GPU memory allocation size is now fully accurate - fixed bug in `write_line()` function in `src/utilities.hpp` - removed `.exe` file extension for Linux/macOS - [v2.4](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.4) (11.03.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.3...v2.4) (UI improvements) - added a help menu with key H that shows keyboard/mouse controls, visualization settings and simulation stats - improvements to keyboard/mouse control (+/- for zoom, mouseclick frees/locks cursor) - added suggestion of largest possible grid resolution if resolution is set larger than memory allows - minor optimizations in multi-GPU communication (insignificant performance difference) - fixed bug in temperature equilibrium function for temperature extension - fixed erroneous double literal for Intel iGPUs in skybox color functions - fixed bug in make.sh where multi-GPU device IDs would not get forwarded to the executable - minor bug fixes in graphics engine (free cursor not centered during rotation, labels in VR mode) - fixed bug in `LBM::voxelize_stl()` size parameter standard initialization - [v2.5](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.5) (11.04.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.4...v2.5) (raytracing overhaul) - implemented light absorption in fluid for raytracing graphics (no performance impact) - improved raytracing framerate when camera is inside fluid - fixed skybox pole flickering artifacts - fixed bug where moving objects during re-voxelization would leave an erroneous trail of solid grid cells behind - [v2.6](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.6) (16.04.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.5...v2.6) (Intel Arc patch) - patched OpenCL issues of Intel Arc GPUs: now VRAM allocations >4GB are possible and correct VRAM capacity is reported - [v2.7](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.7) (29.05.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.6...v2.7) (visualization upgrade) - added slice visualization (key 2 / key 3 modes, then switch through slice modes with key T, move slice with keys Q/E) - made flag wireframe / solid surface visualization kernels toggleable with key 1 - added surface pressure visualization (key 1 when `FORCE_FIELD` is enabled and `lbm.calculate_force_on_boundaries();` is called) - added binary `.vtk` export function for meshes with `lbm.write_mesh_to_vtk(Mesh* mesh);` - added `time_step_multiplicator` for `integrate_particles()` function in PARTICLES extension - made correction of wrong memory reporting on Intel Arc more robust - fixed bug in `write_file()` template functions - reverted back to separate `cl::Context` for each OpenCL device, as the shared Context otherwise would allocate extra VRAM on all other unused Nvidia GPUs - removed Debug and x86 configurations from Visual Studio solution file (one less complication for compiling) - fixed bug that particles could get too close to walls and get stuck, or leave the fluid phase (added boundary force) - [v2.8](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.8) (24.06.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.7...v2.8) (documentation + polish) - finally added more [documentation](DOCUMENTATION.md) - cleaned up all sample setups in `setup.cpp` for more beginner-friendliness, and added required extensions in `defines.hpp` as comments to all setups - improved loading of composite `.stl` geometries, by adding an option to omit automatic mesh repositioning, added more functionality to `Mesh` struct in `utilities.hpp` - added `uint3 resolution(float3 box_aspect_ratio, uint memory)` function to compute simulation box resolution based on box aspect ratio and VRAM occupation in MB - added `bool lbm.graphics.next_frame(...)` function to export images for a specified video length in the `main_setup` compute loop - added `VIS_...` macros to ease setting visualization modes in headless graphics mode in `lbm.graphics.visualization_modes` - simulation box dimensions are now automatically made equally divisible by domains for multi-GPU simulations - fixed Info/Warning/Error message formatting for loading files and made Info/Warning/Error message labels colored - added Ahmed body setup as an example on how body forces and drag coefficient are computed - added Cessna 172 and Bell 222 setups to showcase loading composite .stl geometries and revoxelization of moving parts - added optional semi-transparent rendering mode (`#define GRAPHICS_TRANSPARENCY 0.7f` in `defines.hpp`) - fixed flickering of streamline visualization in interactive graphics - improved smooth positioning of streamlines in slice mode - fixed bug where `mass` and `massex` in `SURFACE` extension were also allocated in CPU RAM (not required) - fixed bug in Q-criterion rendering of halo data in multi-GPU mode, reduced gap width between domains - removed shared memory optimization from mesh voxelization kernel, as it crashes on Nvidia GPUs with new GPU drivers and is incompatible with old OpenCL 1.0 GPUs - fixed raytracing attenuation color when no surface is at the simulation box walls with periodic boundaries - [v2.9](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.9) (31.07.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.8...v2.9) (multithreading) - added cross-platform `parallel_for` implementation in `utilities.hpp` using `std::threads` - significantly (>4x) faster simulation startup with multithreaded geometry initialization and sanity checks - faster `calculate_force_on_object()` and `calculate_torque_on_object()` functions with multithreading - added total runtime and LBM runtime to `lbm.write_status()` - fixed bug in voxelization ray direction for re-voxelizing rotating objects - fixed bug in `Mesh::get_bounding_box_size()` - fixed bug in `print_message()` function in `utilities.hpp` - [v2.10](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.10) (05.11.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.9...v2.10) (frustrum culling) - improved rasterization performance via frustrum culling when only part of the simulation box is visible - improved switching between centered/free camera mode - refactored OpenCL rendering library - unit conversion factors are now automatically printed in console when `units.set_m_kg_s(...)` is used - faster startup time for FluidX3D benchmark - miner bug fix in `voxelize_mesh(...)` kernel - fixed bug in `shading(...)` - replaced slow (in multithreading) `std::rand()` function with standard C99 LCG - more robust correction of wrong VRAM capacity reporting on Intel Arc GPUs - fixed some minor compiler warnings - [v2.11](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.11) (07.12.2023) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.10...v2.11) (improved Linux graphics) - interactive graphics on Linux are now in fullscreen mode too, fully matching Windows - made CPU/GPU buffer initialization significantly faster with `std::fill` and `enqueueFillBuffer` (overall ~8% faster simulation startup) - added operating system info to OpenCL device driver version printout - fixed flickering with frustrum culling at very small field of view - fixed bug where rendered/exported frame was not updated when `visualization_modes` changed - [v2.12](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.12) (18.01.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.11...v2.12) (faster startup) - ~3x faster source code compiling on Linux using multiple CPU cores if [`make`](https://www.gnu.org/software/make/) is installed - significantly faster simulation initialization (~40% single-GPU, ~15% multi-GPU) - minor bug fix in `Memory_Container::reset()` function - [v2.13](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.13) (11.02.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.12...v2.13) (improved .vtk export) - data in exported `.vtk` files is now automatically converted to SI units - ~2x faster `.vtk` export with multithreading - added unit conversion functions for `TEMPERATURE` extension - fixed graphical artifacts with axis-aligned camera in raytracing - fixed `get_exe_path()` for macOS - fixed X11 multi-monitor issues on Linux - workaround for Nvidia driver bug: `enqueueFillBuffer` is broken for large buffers on Nvidia GPUs - fixed slow numeric drift issues caused by `-cl-fast-relaxed-math` - fixed wrong Maximum Allocation Size reporting in `LBM::write_status()` - fixed missing scaling of coordinates to SI units in `LBM::write_mesh_to_vtk()` - [v2.14](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.14) (03.03.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.13...v2.14) (visualization upgrade) - coloring can now be switched between velocity/density/temperature with key Z - uniform improved color palettes for velocity/density/temperature visualization - color scale with automatic unit conversion can now be shown with key H - slice mode for field visualization now draws fully filled-in slices instead of only lines for velocity vectors - shading in `VIS_FLAG_SURFACE` and `VIS_PHI_RASTERIZE` modes is smoother now - `make.sh` now automatically detects operating system and X11 support on Linux and only runs FluidX3D if last compilation was successful - fixed compiler warnings on Android - fixed `make.sh` failing on some systems due to nonstandard interpreter path - fixed that `make` would not compile with multiple cores on some systems - [v2.15](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.15) (09.04.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.14...v2.15) (framerate boost) - eliminated one frame memory copy and one clear frame operation in rendering chain, for 20-70% higher framerate on both Windows and Linux - enabled `g++` compiler optimizations for faster startup and higher rendering framerate - fixed bug in multithreaded sanity checks - fixed wrong unit conversion for thermal expansion coefficient - fixed density to pressure conversion in LBM units - fixed bug that raytracing kernel could lock up simulation - fixed minor visual artifacts with raytracing - fixed that console sometimes was not cleared before `INTERACTIVE_GRAPHICS_ASCII` rendering starts - [v2.16](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.16) (02.05.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.15...v2.16) (bug fixes) - simplified 10% faster marching-cubes implementation with 1D interpolation on edges instead of 3D interpolation, allowing to get rid of edge table - added faster, simplified marching-cubes variant for solid surface rendering where edges are always halfway between grid cells - refactoring in OpenCL rendering kernels - fixed that voxelization failed in Intel OpenCL CPU Runtime due to array out-of-bounds access - fixed that voxelization did not always produce binary identical results in multi-GPU compared to single-GPU - fixed that velocity voxelization failed for free surface simulations - fixed terrible performance on ARM GPUs by macro-replacing fused-multiply-add (`fma`) with `a*b+c` - fixed that Y/Z keys were incorrect for `QWERTY` keyboard layout in Linux - fixed that free camera movement speed in help overlay was not updated in stationary image when scrolling - fixed that cursor would sometimes flicker when scrolling on trackpads with Linux-X11 interactive graphics - fixed flickering of interactive rendering with multi-GPU when camera is not moved - fixed missing `XInitThreads()` call that could crash Linux interactive graphics on some systems - fixed z-fighting between `graphics_rasterize_phi()` and `graphics_flags_mc()` kernels - [v2.17](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.17) (05.06.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.16...v2.17) (unlimited domain resolution) - domains are no longer limited to 4.29 billion (2³², 1624³) grid cells or 225 GB memory; if more are used, the OpenCL code will automatically compile with 64-bit indexing - new, faster raytracing-based field visualization for single-GPU simulations - added [GPU Driver and OpenCL Runtime installation instructions](DOCUMENTATION.md#0-install-gpu-drivers-and-opencl-runtime) to documentation - refactored `INTERACTIVE_GRAPHICS_ASCII` - fixed memory leak in destructors of `floatN`, `floatNxN`, `doubleN`, `doubleNxN` (all unused) - made camera movement/rotation/zoom behavior independent of framerate - fixed that `smart_device_selection()` would print a wrong warning if device reports 0 MHz clock speed - [v2.18](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.18) (21.07.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.17...v2.18) (more bug fixes) - added support for high refresh rate monitors on Linux - more compact OpenCL Runtime installation scripts in Documentation - driver/runtime installation instructions will now be printed to console if no OpenCL devices are available - added domain information to `LBM::write_status()` - added `LBM::index` function for `uint3` input parameter - fixed that very large simulations sometimes wouldn't render properly by increasing maximum render distance from 10k to 2.1M - fixed mouse input stuttering at high screen refresh rate on Linux - fixed graphical artifacts in free surface raytracing on Intel CPU Runtime for OpenCL - fixed runtime estimation printed in console for setups with multiple `lbm.run(...)` calls - fixed density oscillations in sample setups (too large `lbm_u`) - fixed minor graphical artifacts in `raytrace_phi()` - fixed minor graphical artifacts in `ray_grid_traverse_sum()` - fixed wrong printed time step count on raindrop sample setup - [v2.19](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v2.19) (07.09.2024) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v2.18...v2.19) (camera splines) - the camera can now fly along a smooth path through a list of provided keyframe camera placements, [using Catmull-Rom splines](https://github.com/ProjectPhysX/FluidX3D/blob/master/DOCUMENTATION.md#video-rendering) - more accurate remaining runtime estimation that includes time spent on rendering - enabled FP16S memory compression by default - printed camera placement using key G is now formatted for easier copy/paste - added benchmark chart in Readme using mermaid gantt chart - placed memory allocation info during simulation startup at better location - fixed threading conflict between `INTERACTIVE_GRAPHICS` and `lbm.graphics.write_frame();` - fixed maximum buffer allocation size limit for AMD GPUs and in Intel CPU Runtime for OpenCL - fixed wrong `Re` header on some compilers - fixed bug in `split_regex()` - fixed compiler warning with `min_int` - [v3.4](https://github.com/ProjectPhysX/FluidX3D/releases/tag/v3.4) (02.07.2025) [changes](https://github.com/ProjectPhysX/FluidX3D/compare/v3.3...v3.4) (bug fixes) - updated OpenCL driver install versions - minor refactoring in `stream_collide()` - fixed bug in insertion-sort in `voxelize_mesh()` kernel causing crash on AMD GPUs - fixed bug in `voxelize_mesh_on_device()` host code causing initialization corruption on AMD GPUs - fixed dual CU and IPC reporting on AMD RDNA 1-4 GPUs

How to get started?

Read the FluidX3D Documentation!

Compute Features - Getting the Memory Problem under Control

CFD model: lattice Boltzmann method (LBM)
- streaming (part 2/2)
  f₀^temp(x,t) = f₀(x, t)
  f_i^temp(x,t) = f_{(t%2 ? i : (i%2 ? i+1 : i-1))}(i%2 ? x : x-e_i, t) for i ∈ [1, q-1]
- collision
  ρ(x,t) = (Σ_i f_i^temp(x,t)) + 1
  
  u(x,t) = ¹∕_ρ(x,t) Σ_i c_i f_i^temp(x,t)
  
  f_i^eq-shifted(x,t) = w_i ρ · (^(u_°c_i)²∕_(2c⁴) - ^(u_°u)∕_(2c²) + ^(u_°c_i)∕_c²) + w_i (ρ-1)
  
  f_i^temp(x, t+Δt) = f_i^temp(x,t) + Ω_i(f_i^temp(x,t), f_i^eq-shifted(x,t), τ)
- streaming (part 1/2)
  f₀(x, t+Δt) = f₀^temp(x, t+Δt)
  f_{(t%2 ? (i%2 ? i+1 : i-1) : i)}(i%2 ? x+e_i : x, t+Δt) = f_i^temp(x, t+Δt) for i ∈ [1, q-1]
- variables and notation
| variable | SI units | defining equation | description | | :------------------: | :---------------------------------: | :-------------------------------------------------: | :------------------------------------------------------------------------------ | | | | | | | x | m | x = (x,y,z)^T | 3D position in Cartesian coordinates | | t | s | - | time | | ρ | ^kg∕_m³ | ρ = (Σ_i f_i)+1 | mass density of fluid | | p | ^kg∕_m s² | p = c² ρ | pressure of fluid | | u | ^m∕_s | u = ¹∕_ρ Σ_i c_i f_i | velocity of fluid | | ν | ^m²∕_s | ν = ^μ∕_ρ | kinematic shear viscosity of fluid | | μ | ^kg∕_m s | μ = ρ ν | dynamic viscosity of fluid | | | | | | | f_i | ^kg∕_m³ | - | shifted density distribution functions (DDFs) | | Δx | m | Δx = 1 | lattice constant (in LBM units) | | Δt | s | Δt = 1 | simulation time step (in LBM units) | | c | ^m∕_s | c = ¹∕_√3 ^Δx∕_Δt | lattice speed of sound (in LBM units) | | i | 1 | 0 ≤ i < q | LBM streaming direction index | | q | 1 | q ∈ { 9,15,19,27 } | number of LBM streaming directions | | e_i | m | D2Q9 / D3Q15/19/27 | LBM streaming directions | | c_i | ^m∕_s | c_i = ^e_i∕_Δt | LBM streaming velocities | | w_i | 1 | Σ_i w_i = 1 | LBM velocity set weights | | Ω_i | ^kg∕_m³ | SRT or TRT | LBM collision operator | | τ | s | τ = ^ν∕_c² + ^Δt∕₂ | LBM relaxation time |
- velocity sets: D2Q9, D3Q15, D3Q19 (default), D3Q27 - collision operators: single-relaxation-time (SRT/BGK) (default), two-relaxation-time (TRT) - DDF-shifting and other algebraic optimization to minimize round-off error

optimized to minimize VRAM footprint to 1/6 of other LBM codes
- traditional LBM (D3Q19) with FP64 requires ~344 Bytes/cell
- 🟧🟧🟧🟧🟧🟧🟧🟧🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟨🟨🟨🟨🟨🟨🟨🟨🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥🟥
  (density 🟧, velocity 🟦, flags 🟨, 2 copies of DDFs 🟩/🟥; each square = 1 Byte)
- allows for 3 Million cells per 1 GB VRAM
- FluidX3D (D3Q19) requires only 55 Bytes/cell with Esoteric-Pull+FP16
- 🟧🟧🟧🟧🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟦🟨🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩🟩
  (density 🟧, velocity 🟦, flags 🟨, DDFs 🟩; each square = 1 Byte)
- allows for 19 Million cells per 1 GB VRAM
- in-place streaming with Esoteric-Pull: eliminates redundant copy of density distribution functions (DDFs) in memory; almost cuts memory demand in half and slightly increases performance due to implicit bounce-back boundaries; offers optimal memory access patterns for single-cell in-place streaming
- decoupled arithmetic precision (FP32) and memory precision (FP32 or FP16S or FP16C): all arithmetic is done in FP32 for compatibility on all hardware, but DDFs in memory can be compressed to FP16S or FP16C: almost cuts memory demand in half again and almost doubles performance, without impacting overall accuracy for most setups
- only 8 flag bits per lattice point (can be used independently / at the same time)
  - TYPE_S (stationary or moving) solid boundaries
  - TYPE_E equilibrium boundaries (inflow/outflow)
  - TYPE_T temperature boundaries
  - TYPE_F free surface (fluid)
  - TYPE_I free surface (interface)
  - TYPE_G free surface (gas)
  - TYPE_X remaining for custom use or further extensions
  - TYPE_Y remaining for custom use or further extensions
- large cost saving: comparison of maximum single-GPU grid resolution for D3Q19 LBM

| GPU VRAM capacity | 1 GB | 2 GB | 3 GB | 4 GB | 6 GB | 8 GB | 10 GB | 11 GB | 12 GB | 16 GB | 20 GB | 24 GB | 32 GB | 40 GB | 48 GB | 64 GB | 80 GB | 94 GB | 128 GB | 180 GB | 192 GB | 256 GB | 288 GB | 6 TB | | :------------------------------- | --------: | --------: | --------: | --------: | --------: | --------: | ---------: | ---------: | ---------: | ---------: | ---------: | ---------: | ---------: | ---------: | ---------: | ---------: | ---------: | ---------: | ----------: | ----------: | ----------: | ----------: | ----------: | --------: | | approximate GPU price | $25
GT 210 | $25
GTX 950 | $12
GTX 1060 | $50
GT 730 | $35
GTX 1060 | $70
RX 470 | $500
RTX 3080 | $240
GTX 1080 Ti | $75
Tesla M40 | $75
Instinct MI25 | $900
RX 7900 XT | $205
Tesla P40 | $600
Instinct MI60 | $5500
A100 | $2400
RTX 8000 | $10k
Instinct MI210 | $11k
A100 | >$40k
H100 NVL | ?
GPU Max 1550 | ~$80k
B200 | ~$15k
MI300X | ~$25k
MI325X | ~$25k
MI350X | ~$80k
2x Xeon 6960P | | traditional LBM (FP64) | 144³ | 182³ | 208³ | 230³ | 262³ | 288³ | 312³ | 322³ | 330³ | 364³ | 392³ | 418³ | 460³ | 494³ | 526³ | 578³ | 624³ | 658³ | 730³ | 818³ | 836³ | 920³ | 956³ | 2654³ | | FluidX3D (FP32/FP32) | 224³ | 282³ | 322³ | 354³ | 406³ | 448³ | 482³ | 498³ | 512³ | 564³ | 608³ | 646³ | 710³ | 766³ | 814³ | 896³ | 966³ | 1018³ | 1130³ | 1266³ | 1292³ | 1422³ | 1480³ | 4106³ | | FluidX3D (FP32/FP16) | 266³ | 336³ | 384³ | 424³ | 484³ | 534³ | 574³ | 594³ | 610³ | 672³ | 724³ | 770³ | 848³ | 912³ | 970³ | 1068³ | 1150³ | 1214³ | 1346³ | 1508³ | 1540³ | 1696³ | 1764³ | 4892³ |

-

cross-vendor multi-GPU support on a single computer/server

domain decomposition allows pooling VRAM from multiple GPUs for much larger grid resolution
GPUs don't have to be identical, not even from the same vendor - any combination of AMD+Intel+Nvidia GPUs will work - but similar VRAM capacity/bandwidth is recommended
domain communication architecture (simplified) diff ++ .-----------------------------------------------------------------. ++ ++ | GPU 0 | ++ ++ | LBM Domain 0 | ++ ++ '-----------------------------------------------------------------' ++ ++ | selective /|\ ++ ++ \|/ in-VRAM copy | ++ ++ .-------------------------------------------------------. ++ ++ | GPU 0 - Transfer Buffer 0 | ++ ++ '-------------------------------------------------------' ++ !! | PCIe /|\ !! !! \|/ copy | !! @@ .-------------------------. .-------------------------. @@ @@ | CPU - Transfer Buffer 0 | | CPU - Transfer Buffer 1 | @@ @@ '-------------------------'\ /'-------------------------' @@ @@ pointer X swap @@ @@ .-------------------------./ \.-------------------------. @@ @@ | CPU - Transfer Buffer 1 | | CPU - Transfer Buffer 0 | @@ @@ '-------------------------' '-------------------------' @@ !! /|\ PCIe | !! !! | copy \|/ !! ++ .-------------------------------------------------------. ++ ++ | GPU 1 - Transfer Buffer 1 | ++ ++ '-------------------------------------------------------' ++ ++ /|\ selective | ++ ++ | in-VRAM copy \|/ ++ ++ .-----------------------------------------------------------------. ++ ++ | GPU 1 | ++ ++ | LBM Domain 1 | ++ ++ '-----------------------------------------------------------------' ++ ## | ## ## domain synchronization barrier ## ## | ## || -------------------------------------------------------------> time ||
domain communication architecture (detailed) diff ++ .-----------------------------------------------------------------. ++ ++ | GPU 0 | ++ ++ | LBM Domain 0 | ++ ++ '-----------------------------------------------------------------' ++ ++ | selective in- /|\ | selective in- /|\ | selective in- /|\ ++ ++ \|/ VRAM copy (X) | \|/ VRAM copy (Y) | \|/ VRAM copy (Z) | ++ ++ .---------------------.---------------------.---------------------. ++ ++ | GPU 0 - TB 0X+ | GPU 0 - TB 0Y+ | GPU 0 - TB 0Z+ | ++ ++ | GPU 0 - TB 0X- | GPU 0 - TB 0Y- | GPU 0 - TB 0Z- | ++ ++ '---------------------'---------------------'---------------------' ++ !! | PCIe /|\ | PCIe /|\ | PCIe /|\ !! !! \|/ copy | \|/ copy | \|/ copy | !! @@ .---------. .---------.---------. .---------.---------. .---------. @@ @@ | CPU 0X+ | | CPU 1X- | CPU 0Y+ | | CPU 3Y- | CPU 0Z+ | | CPU 5Z- | @@ @@ | CPU 0X- | | CPU 2X+ | CPU 0Y- | | CPU 4Y+ | CPU 0Z- | | CPU 6Z+ | @@ @@ '---------\ /---------'---------\ /---------'---------\ /---------' @@ @@ pointer X swap (X) pointer X swap (Y) pointer X swap (Z) @@ @@ .---------/ \---------.---------/ \---------.---------/ \---------. @@ @@ | CPU 1X- | | CPU 0X+ | CPU 3Y- | | CPU 0Y+ | CPU 5Z- | | CPU 0Z+ | @@ @@ | CPU 2X+ | | CPU 0X- | CPU 4Y+ | | CPU 0Y- | CPU 6Z+ | | CPU 0Z- | @@ @@ '---------' '---------'---------' '---------'---------' '---------' @@ !! /|\ PCIe | /|\ PCIe | /|\ PCIe | !! !! | copy \|/ | copy \|/ | copy \|/ !! ++ .--------------------..---------------------..--------------------. ++ ++ | GPU 1 - TB 1X- || GPU 3 - TB 3Y- || GPU 5 - TB 5Z- | ++ ++ :====================::=====================::====================: ++ ++ | GPU 2 - TB 2X+ || GPU 4 - TB 4Y+ || GPU 6 - TB 6Z+ | ++ ++ '--------------------''---------------------''--------------------' ++ ++ /|\ selective in- | /|\ selective in- | /|\ selective in- | ++ ++ | VRAM copy (X) \|/ | VRAM copy (Y) \|/ | VRAM copy (Z) \|/ ++ ++ .--------------------..---------------------..--------------------. ++ ++ | GPU 1 || GPU 3 || GPU 5 | ++ ++ | LBM Domain 1 || LBM Domain 3 || LBM Domain 5 | ++ ++ :====================::=====================::====================: ++ ++ | GPU 2 || GPU 4 || GPU 6 | ++ ++ | LBM Domain 2 || LBM Domain 4 || LBM Domain 6 | ++ ++ '--------------------''---------------------''--------------------' ++ ## | | | ## ## | domain synchronization barriers | ## ## | | | ## || -------------------------------------------------------------> time ||

-

peak performance on GPUs (datacenter/gaming/professional/laptop)

single-GPU/CPU benchmarks
multi-GPU benchmarks

-

powerful model extensions

boundary types
- stationary mid-grid bounce-back boundaries (stationary solid boundaries)
- moving mid-grid bounce-back boundaries (moving solid boundaries)
- equilibrium boundaries (non-reflective inflow/outflow)
- temperature boundaries (fixed temperature)
global force per volume (Guo forcing), can be modified on-the-fly
local force per volume (force field)
- optional computation of forces from the fluid on solid boundaries
state-of-the-art free surface LBM (FSLBM) implementation:
- volume-of-fluid model
- fully analytic PLIC for efficient curvature calculation
- improved mass conservation
- ultra efficient implementation with only 4 kernels additionally to stream_collide() kernel
thermal LBM to simulate thermal convection
- D3Q7 subgrid for thermal DDFs
- in-place streaming with Esoteric-Pull for thermal DDFs
- optional FP16S or FP16C compression for thermal DDFs with DDF-shifting
Smagorinsky-Lilly subgrid turbulence LES model to keep simulations with very large Reynolds number stable
Π_αβ = Σ_i e_iα e_iβ (f_i - f_i^eq-shifted)

Q = Σ_αβ Π_αβ²
______________________
τ = ½ (τ₀ + √ τ₀² + ^(16√2)∕_(3π²) ^√Q∕_ρ )
particles with immersed-boundary method (either passive or 2-way-coupled, single-GPU only)

Solving the Visualization Problem

FluidX3D can do simulations so large that storing the volumetric data for later rendering becomes unmanageable (like 120GB for a single frame, hundreds of TeraByte for a video)
instead, FluidX3D allows rendering raw simulation data directly in VRAM, so no large volumetric files have to be exported to the hard disk (see my technical talk)
the rendering is so fast that it works interactively in real time for both rasterization and raytracing
rasterization and raytracing are done in OpenCL and work on all GPUs, even the ones without RTX/DXR raytracing cores or without any rendering hardware at all (like A100, MI200, ...)
if no monitor is available (like on a remote Linux server), there is an ASCII rendering mode to interactively visualize the simulation in the terminal (even in WSL and/or through SSH)
rendering is fully multi-GPU-parallelized via seamless domain decomposition rasterization
with interactive graphics mode disabled, image resolution can be as large as VRAM allows for (4K/8K/16K and above)
(interacitive) visualization modes:
- flag wireframe / solid surface (and force vectors on solid cells or surface pressure if the extension is used)
- velocity field (with slice mode)
- streamlines (with slice mode)
- velocity-colored Q-criterion isosurface
- rasterized free surface with marching-cubes
- raytraced free surface with fast ray-grid traversal and marching-cubes, either 1-4 rays/pixel or 1-10 rays/pixel

Solving the Compatibility Problem

FluidX3D is written in OpenCL 1.2, so it runs on all hardware from all vendors (Nvidia, AMD, Intel, ...):
- world's fastest datacenter GPUs: B200, MI300X, H200, H100 (NVL), A100, MI200, MI100, V100(S), GPU Max 1100, ...
- gaming GPUs (desktop/laptop): Nvidia GeForce, AMD Radeon, Intel Arc
- professional/workstation GPUs: Nvidia Quadro, AMD Radeon Pro / FirePro, Intel Arc Pro
- integrated GPUs
- CPUs (requires installation of Intel CPU Runtime for OpenCL)
- Intel Xeon Phi (requires installation of Intel CPU Runtime for OpenCL)
- smartphone ARM GPUs
native cross-vendor multi-GPU implementation
- uses PCIe communication, so no SLI/Crossfire/NVLink/InfinityFabric required
- single-node parallelization, so no MPI installation required
- GPUs don't even have to be from the same vendor, but similar memory capacity and bandwidth are recommended
works on Windows and Linux with C++17, with limited support also for macOS and Android
supports importing and voxelizing triangle meshes from binary .stl files, with fast GPU voxelization
supports exporting volumetric data as binary .vtk files
supports exporting triangle meshes as binary .vtk files
supports exporting rendered images as .png/.qoi/.bmp files; encoding runs in parallel on the CPU while the simulation on GPU can continue without delay

Single-GPU/CPU Benchmarks

Here are performance benchmarks on various hardware in MLUPs/s, or how many million lattice cells are updated per second. The settings used for the benchmark are D3Q19 SRT with no extensions enabled (only LBM with implicit mid-grid bounce-back boundaries) and the setup consists of an empty cubic box with sufficient size (typically 256³). Without extensions, a single lattice cell requires: - a memory capacity of 93 (FP32/FP32) or 55 (FP32/FP16) Bytes - a memory bandwidth of 153 (FP32/FP32) or 77 (FP32/FP16) Bytes per time step - 363 (FP32/FP32) or 406 (FP32/FP16S) or 1275 (FP32/FP16C) FLOPs per time step (FP32+INT32 operations counted combined)

In consequence, the arithmetic intensity of this implementation is 2.37 (FP32/FP32) or 5.27 (FP32/FP16S) or 16.56 (FP32/FP16C) FLOPs/Byte. So performance is only limited by memory bandwidth. The table in the left 3 columns shows the hardware specs as found in the data sheets (theoretical peak FP32 compute performance, memory capacity, theoretical peak memory bandwidth). The right 3 columns show the measured FluidX3D performance for FP32/FP32, FP32/FP16S, FP32/FP16C floating-point precision settings, with the (roofline model efficiency) in round brackets, indicating how much % of theoretical peak memory bandwidth are being used.

If your GPU/CPU is not on the list yet, you can report your benchmarks here.

```mermaid gantt

title FluidX3D Performance [MLUPs/s] - FP32 arithmetic, (fastest of FP32/FP16S/FP16C) memory storage dateFormat X axisFormat %s %%{ init: { "gantt": { 'titleTopMargin': 42, 'topPadding': 70, 'leftPadding': 260, 'rightPadding': 5, 'sectionFontSize': 20, 'fontSize': 20, 'barHeight': 20, 'barGap': 3, 'numberSectionStyles': 2 }, 'theme': 'forest', 'themeVariables': { 'sectionBkgColor': '#99999999', 'altSectionBkgColor': '#00000000', 'titleColor': '#AFAFAF', 'textColor': '#AFAFAF', 'taskTextColor': 'black', 'taskBorderColor': '#487E3A' } } }%%

section MI300X 41327 :crit, 0, 41327 section MI250 (1 GCD) 9030 :crit, 0, 9030 section MI210 9547 :crit, 0, 9547 section MI100 8542 :crit, 0, 8542 section MI60 5111 :crit, 0, 5111 section MI50 32GB 8477 :crit, 0, 8477 section Radeon VII 7778 :crit, 0, 7778 section GPU Max 1100 6303 :done, 0, 6303 section B200 SXM6 180GB 55609 : 0, 55609 section H200 SXM5 141GB 36610 : 0, 36610 section GH200 94GB GPU 34689 : 0, 34689 section H100 NVL 32922 : 0, 32922 section H100 SXM5 80GB HBM3 29561 : 0, 29561 section H100 PCIe 80GB HBM2e 20624 : 0, 20624 section A100 SXM4 80GB 18448 : 0, 18448 section A100 PCIe 80GB 17896 : 0, 17896 section PG506-242/243 15654 : 0, 15654 section A100 SXM4 40GB 16013 : 0, 16013 section A100 PCIe 40GB 16035 : 0, 16035 section CMP 170HX 12392 : 0, 12392 section A30 9721 : 0, 9721 section V100 SXM2 32GB 8947 : 0, 8947 section V100 PCIe 16GB 10325 : 0, 10325 section GV100 6641 : 0, 6641 section Titan V 7253 : 0, 7253 section P100 PCIe 16GB 5950 : 0, 5950 section P100 PCIe 12GB 4141 : 0, 4141 section GTX TITAN 2500 : 0, 2500 section K40m 1868 : 0, 1868 section K80 (1 GPU) 1642 : 0, 1642 section K20c 1507 : 0, 1507

section RX 9070 XT 6688 :crit, 0, 6688 section RX 9070 6019 :crit, 0, 6019 section RX 7900 XTX 7716 :crit, 0, 7716 section PRO W7900 5939 :crit, 0, 5939 section RX 7900 XT 5986 :crit, 0, 5986 section RX 7800 XT 3105 :crit, 0, 3105 section PRO W7800 4426 :crit, 0, 4426 section RX 7900 GRE 4570 :crit, 0, 4570 section PRO W7700 2943 :crit, 0, 2943 section RX 7700 XT 2828 :crit, 0, 2828 section RX 7600 2561 :crit, 0, 2561 section PRO W7600 2287 :crit, 0, 2287 section PRO W7500 1682 :crit, 0, 1682 section RX 6900 XT 4227 :crit, 0, 4227 section RX 6800 XT 4241 :crit, 0, 4241 section PRO W6800 3361 :crit, 0, 3361 section RX 6700 XT 2908 :crit, 0, 2908 section RX 6750 GRE 12GB 2848 :crit, 0, 2848 section RX 6800M 3213 :crit, 0, 3213 section RX 6700M 2429 :crit, 0, 2429 section RX 6600 1839 :crit, 0, 1839 section RX 6500 XT 1030 :crit, 0, 1030 section RX 5700 XT 3253 :crit, 0, 3253 section RX 5700 3167 :crit, 0, 3167 section RX 5600 XT 2214 :crit, 0, 2214 section RX Vega 64 3227 :crit, 0, 3227 section RX 590 1688 :crit, 0, 1688 section RX 580 4GB 1848 :crit, 0, 1848 section RX 580 2048SP 8GB 1622 :crit, 0, 1622 section RX 480 8GB 1908 :crit, 0, 1908 section Pro WX 5100 1604 :crit, 0, 1604 section R9 Fury X 2880 :crit, 0, 2880 section R9 Nano 2761 :crit, 0, 2761 section R9 390X 2217 :crit, 0, 2217 section R9 290X 1699 :crit, 0, 1699 section R9 290 1647 :crit, 0, 1647 section HD 7970 1563 :crit, 0, 1563 section HD 7870 868 :crit, 0, 868 section HD 7850 635 :crit, 0, 635 section HD 5870 257 :crit, 0, 257 section HD 5770 132 :crit, 0, 132 section FirePro V5800 108 :crit, 0, 108 section Arc B580 LE 4979 :done, 0, 4979 section Arc A770 LE 4568 :done, 0, 4568 section Arc A750 LE 4314 :done, 0, 4314 section Arc A580 3889 :done, 0, 3889 section Arc Pro A40 985 :done, 0, 985 section Arc A380 1115 :done, 0, 1115 section RTX PRO 6000 Blackwell 20841 : 0, 20841 section RTX 5090 19141 : 0, 19141 section RTX 5080 10304 : 0, 10304 section RTX 5070 7238 : 0, 7238 section RTX 4090 11496 : 0, 11496 section RTX 6000 Ada 10293 : 0, 10293 section L40S 7637 : 0, 7637 section L40 7945 : 0, 7945 section RTX 4080 Super 8218 : 0, 8218 section RTX 4080 7933 : 0, 7933 section RTX 4070 Ti Super 7295 : 0, 7295 section RTX 4090M 6901 : 0, 6901 section RTX 4070 Super 5554 : 0, 5554 section RTX 4070 5016 : 0, 5016 section RTX 4080M 5114 : 0, 5114 section RTX 4000 Ada 4221 : 0, 4221 section L4 2857 : 0, 2857 section RTX 4060 3124 : 0, 3124 section RTX 4070M 3092 : 0, 3092 section RTX 2000 Ada 2526 : 0, 2526 section RTX 3090 Ti 10956 : 0, 10956 section RTX 3090 10732 : 0, 10732 section RTX 3080 Ti 9832 : 0, 9832 section RTX 3080 12GB 9657 : 0, 9657 section RTX A6000 8814 : 0, 8814 section RTX A5000 8617 : 0, 8617 section RTX 3080 10GB 8118 : 0, 8118 section RTX A40 6622 : 0, 6622 section RTX 3070 Ti 6807 : 0, 6807 section RTX A10 5741 : 0, 5741 section RTX 3080M Ti 5908 : 0, 5908 section RTX 3070 5096 : 0, 5096 section RTX 3060 Ti 5129 : 0, 5129 section RTX A4000 4945 : 0, 4945 section RTX A5000M 4461 : 0, 4461 section RTX 3060 4070 : 0, 4070 section RTX 3060M 4012 : 0, 4012 section A16 (1 GPU) 2031 : 0, 2031 section A2 2051 : 0, 2051 section RTX 3050M Ti 2341 : 0, 2341 section RTX 3050M 2339 : 0, 2339 section RTX 3050 6GB 1898 : 0, 1898 section Titan RTX 7554 : 0, 7554 section RTX 6000 6879 : 0, 6879 section RTX 8000 Passive 5607 : 0, 5607 section RTX 2080 Ti 6853 : 0, 6853 section RTX 2080 Super 5284 : 0, 5284 section RTX 5000 4773 : 0, 4773 section RTX 2080 4977 : 0, 4977 section RTX 2070 Super 4893 : 0, 4893 section RTX 2070 5017 : 0, 5017 section RTX 2060 Super 5035 : 0, 5035 section RTX 4000 4584 : 0, 4584 section RTX 2060 KO 3376 : 0, 3376 section RTX 2060 3604 : 0, 3604 section GTX 1660 Super 3551 : 0, 3551 section T4 2887 : 0, 2887 section GTX 1660 Ti 3041 : 0, 3041 section GTX 1660 1992 : 0, 1992 section GTX 1650M 896C 1858 : 0, 1858 section GTX 1650M 1024C 1400 : 0, 1400 section T500 665 : 0, 665 section Titan Xp 5495 : 0, 5495 section GTX 1080 Ti 4877 : 0, 4877 section GTX 1080 3182 : 0, 3182 section GTX 1060 6GB 1925 : 0, 1925 section GTX 1060M 1882 : 0, 1882 section GTX 1050M Ti 1224 : 0, 1224 section P1000 839 : 0, 839 section GTX TITAN X 2665 : 0, 2665 section GTX 980 Ti 2703 : 0, 2703 section GTX 980 1965 : 0, 1965 section GTX 970 1721 : 0, 1721 section M4000 1519 : 0, 1519 section M60 (1 GPU) 1571 : 0, 1571 section GTX 960M 872 : 0, 872 section GTX 780 Ti 2776 : 0, 2776 section GTX 770 1215 : 0, 1215 section GTX 680 4GB 1274 : 0, 1274 section GTX 670 1220 : 0, 1220 section GTX 660 1146 : 0, 1146 section GTX 660 OEM 990 : 0, 990 section K2000 444 : 0, 444 section GT 630 (OEM) 185 : 0, 185 section GTX 580 1481 : 0, 1481 section GTX 560 Ti 895 : 0, 895 section GTX 480 1264 : 0, 1264 section GTX 280 462 : 0, 462 section GTX 260 421 : 0, 421 section FX 5800 296 : 0, 296 section NVS 290 9 : 0, 9 section Arise 1020 6 :active, 0, 6

section M2 Ultra (76-CU, 192GB) 8769 :active, 0, 8769 section M2 Max (38-CU, 32GB) 4641 :active, 0, 4641 section M2 Pro (19-CU, 16GB) 2374 :active, 0, 2374 section M1 Ultra (64-CU, 128GB) 8418 :active, 0, 8418 section M1 Max (24-CU, 32GB) 4496 :active, 0, 4496 section M1 Pro (16-CU, 16GB) 2329 :active, 0, 2329 section M1 (8-CU, 16GB) 759 :active, 0, 759 section Radeon 8060S (Max+ 395) 2563 :crit, 0, 2563 section Radeon 780M (Z1 Extreme) 860 :crit, 0, 860 section Radeon Graphics (7800X3D) 498 :crit, 0, 498 section Vega 8 (4750G) 511 :crit, 0, 511 section Vega 8 (3500U) 288 :crit, 0, 288 section Arc 140V GPU (16GB) 1282 :done, 0, 1282 section Arc Graphics (Ultra 9 185H) 724 :done, 0, 724 section Iris Xe Graphics (i7-1265U) 621 :done, 0, 621 section UHD Xe 32EUs 245 :done, 0, 245 section UHD 770 475 :done, 0, 475 section UHD 630 301 :done, 0, 301 section UHD P630 288 :done, 0, 288 section HD 5500 192 :done, 0, 192 section HD 4600 115 :done, 0, 115 section Orange Pi 5 Mali-G610 MP4 232 :active, 0, 232 section Samsung Mali-G72 MP18 230 :active, 0, 230

section 2x EPYC 9754 5179 :crit, 0, 5179 section 2x EPYC 9654 4092 :crit, 0, 4092 section 2x EPYC 9554 2552 :crit, 0, 2552 section 1x EPYC 9124 772 :crit, 0, 772 section 2x EPYC 7713 1418 :crit, 0, 1418 section 2x EPYC 7352 739 :crit, 0, 739 section 2x EPYC 7313 498 :crit, 0, 498 section 2x EPYC 7302 784 :crit, 0, 784 section 2x Xeon 6980P 7875 :done, 0, 7875 section 2x Xeon 6979P 8135 :done, 0, 8135 section 2x Xeon 6960P 5477 :done, 0, 5477 section 2x Platinum 8592+ 3135 :done, 0, 3135 section 2x Gold 6548N 1811 :done, 0, 1811 section 2x CPU Max 9480 2037 :done, 0, 2037 section 2x Platinum 8480+ 2162 :done, 0, 2162 section 2x Platinum 8470 2068 :done, 0, 2068 section 2x Gold 6438Y+ 1945 :done, 0, 1945 section 2x Platinum 8380 1410 :done, 0, 1410 section 2x Platinum 8358 1285 :done, 0, 1285 section 2x Platinum 8256 396 :done, 0, 396 section 2x Platinum 8153 691 :done, 0, 691 section 2x Gold 6248R 755 :done, 0, 755 section 2x Gold 6128 254 :done, 0, 254 section Phi 7210 415 :done, 0, 415 section 4x E5-4620 v4 460 :done, 0, 460 section 2x E5-2630 v4 264 :done, 0, 264 section 2x E5-2623 v4 125 :done, 0, 125 section 2x E5-2680 v3 304 :done, 0, 304 section GH200 Neoverse-V2 1323 : 0, 1323 section TR PRO 7995WX 1715 :crit, 0, 1715 section TR 3970X 463 :crit, 0, 463 section TR 1950X 273 :crit, 0, 273 section Ryzen 7900X3D 521 :crit, 0, 521 section Ryzen 7800X3D 363 :crit, 0, 363 section Ryzen 5700X3D 229 :crit, 0, 229 section FX-6100 22 :crit, 0, 22 section Athlon X2 QL-65 3 :crit, 0, 3 section Ultra 7 258V 287 :done, 0, 287 section Ultra 9 185H 317 :done, 0, 317 section i9-14900K 490 :done, 0, 490 section i7-13700K 504 :done, 0, 504 section i7-1265U 128 :done, 0, 128 section i9-11900KB 208 :done, 0, 208 section i9-10980XE 286 :done, 0, 286 section E-2288G 198 :done, 0, 198 section i7-9700 103 :done, 0, 103 section i5-9600 147 :done, 0, 147 section i7-8700K 152 :done, 0, 152 section E-2176G 201 :done, 0, 201 section i7-7700HQ 108 :done, 0, 108 section E3-1240 v5 141 :done, 0, 141 section i5-5300U 37 :done, 0, 37 section i7-4770 104 :done, 0, 104 section i7-4720HQ 80 :done, 0, 80 section N2807 7 :done, 0, 7 ```

Single-GPU/CPU Benchmark Table

Colors: 🔴 AMD, 🔵 Intel, 🟢 Nvidia, ⚪ Apple, 🟡 ARM, 🟤 Glenfly | Device | FP32
[TFlops/s] | Mem
[GB] | BW
[GB/s] | FP32/FP32
[MLUPs/s] | FP32/FP16S
[MLUPs/s] | FP32/FP16C
[MLUPs/s] | | :----------------------------------------------- | -----------------: | ----------: | -----------: | ---------------------: | ----------------------: | ----------------------: | | | | | | | | | | 🔴 Instinct MI300X | 163.40 | 192 | 5300 | 22867 (66%) | 41327 (60%) | 31670 (46%) | | 🔴 Instinct MI250 (1 GCD) | 45.26 | 64 | 1638 | 5638 (53%) | 9030 (42%) | 8506 (40%) | | 🔴 Instinct MI210 | 45.26 | 64 | 1638 | 6517 (61%) | 9547 (45%) | 8829 (41%) | | 🔴 Instinct MI100 | 46.14 | 32 | 1228 | 5093 (63%) | 8133 (51%) | 8542 (54%) | | 🔴 Instinct MI60 | 14.75 | 32 | 1024 | 3570 (53%) | 5047 (38%) | 5111 (38%) | | 🔴 Instinct MI50 32GB | 13.25 | 32 | 1024 | 4446 (66%) | 8477 (64%) | 4406 (33%) | | 🔴 Radeon VII | 13.83 | 16 | 1024 | 4898 (73%) | 7778 (58%) | 5256 (40%) | | 🔵 Data Center GPU Max 1100 | 22.22 | 48 | 1229 | 3769 (47%) | 6303 (39%) | 3520 (22%) | | 🟢 B200 SXM6 180GB | 74.45 | 180 | 8000 | 42152 (81%) | 55609 (54%) | 22695 (22%) | | 🟢 H200 SXM5 141GB | 66.91 | 141 | 4800 | 23056 (73%) | 36610 (59%) | 20291 (33%) | | 🟢 GH200 94GB GPU | 66.91 | 94 | 4000 | 20595 (79%) | 34689 (67%) | 19407 (37%) | | 🟢 H100 NVL | 60.32 | 94 | 3938 | 20303 (79%) | 32922 (64%) | 18424 (36%) | | 🟢 H100 SXM5 80GB HBM3 | 66.91 | 80 | 3350 | 17602 (80%) | 29561 (68%) | 20227 (46%) | | 🟢 H100 PCIe 80GB HBM2e | 51.01 | 80 | 2000 | 11128 (85%) | 20624 (79%) | 13862 (53%) | | 🟢 A100 SXM4 80GB | 19.49 | 80 | 2039 | 10228 (77%) | 18448 (70%) | 11197 (42%) | | 🟢 A100 PCIe 80GB | 19.49 | 80 | 1935 | 9657 (76%) | 17896 (71%) | 10817 (43%) | | 🟢 PG506-243 / PG506-242 | 22.14 | 64 | 1638 | 8195 (77%) | 15654 (74%) | 12271 (58%) | | 🟢 A100 SXM4 40GB | 19.49 | 40 | 1555 | 8522 (84%) | 16013 (79%) | 11251 (56%) | | 🟢 A100 PCIe 40GB | 19.49 | 40 | 1555 | 8526 (84%) | 16035 (79%) | 11088 (55%) | | 🟢 CMP 170HX | 6.32 | 8 | 1493 | 7684 (79%) | 12392 (64%) | 6859 (35%) | | 🟢 A30 | 10.32 | 24 | 933 | 5004 (82%) | 9721 (80%) | 5726 (47%) | | 🟢 Tesla V100 SXM2 32GB | 15.67 | 32 | 900 | 4471 (76%) | 8947 (77%) | 7217 (62%) | | 🟢 Tesla V100 PCIe 16GB | 14.13 | 16 | 900 | 5128 (87%) | 10325 (88%) | 7683 (66%) | | 🟢 Quadro GV100 | 16.66 | 32 | 870 | 3442 (61%) | 6641 (59%) | 5863 (52%) | | 🟢 Titan V | 14.90 | 12 | 653 | 3601 (84%) | 7253 (86%) | 6957 (82%) | | 🟢 Tesla P100 16GB | 9.52 | 16 | 732 | 3295 (69%) | 5950 (63%) | 4176 (44%) | | 🟢 Tesla P100 12GB | 9.52 | 12 | 549 | 2427 (68%) | 4141 (58%) | 3999 (56%) | | 🟢 GeForce GTX TITAN | 4.71 | 6 | 288 | 1460 (77%) | 2500 (67%) | 1113 (30%) | | 🟢 Tesla K40m | 4.29 | 12 | 288 | 1131 (60%) | 1868 (50%) | 912 (24%) | | 🟢 Tesla K80 (1 GPU) | 4.11 | 12 | 240 | 916 (58%) | 1642 (53%) | 943 (30%) | | 🟢 Tesla K20c | 3.52 | 5 | 208 | 861 (63%) | 1507 (56%) | 720 (27%) | | | | | | | | | | 🔴 Radeon RX 9070 XT | 48.66 | 16 | 640 | 3089 (74%) | 6688 (80%) | 6090 (73%) | | 🔴 Radeon RX 9070 | 36.13 | 16 | 640 | 3007 (72%) | 5746 (69%) | 6019 (72%) | | 🔴 Radeon RX 7900 XTX | 61.44 | 24 | 960 | 3665 (58%) | 7644 (61%) | 7716 (62%) | | 🔴 Radeon PRO W7900 | 61.30 | 48 | 864 | 3107 (55%) | 5939 (53%) | 5780 (52%) | | 🔴 Radeon RX 7900 XT | 51.61 | 20 | 800 | 3013 (58%) | 5856 (56%) | 5986 (58%) | | 🔴 Radeon RX 7800 XT | 37.32 | 16 | 624 | 1704 (42%) | 3105 (38%) | 3061 (38%) | | 🔴 Radeon PRO W7800 | 45.20 | 32 | 576 | 1872 (50%) | 4426 (59%) | 4145 (55%) | | 🔴 Radeon RX 7900 GRE | 42.03 | 16 | 576 | 1996 (53%) | 4570 (61%) | 4463 (60%) | | 🔴 Radeon PRO W7700 | 28.30 | 16 | 576 | 1547 (41%) | 2943 (39%) | 2899 (39%) | | 🔴 Radeon RX 7700 XT | 30.77 | 12 | 432 | 1405 (50%) | 2702 (48%) | 2828 (50%) | | 🔴 Radeon RX 7600 | 21.75 | 8 | 288 | 1250 (66%) | 2561 (68%) | 2512 (67%) | | 🔴 Radeon PRO W7600 | 20.00 | 8 | 288 | 1179 (63%) | 2263 (61%) | 2287 (61%) | | 🔴 Radeon PRO W7500 | 12.20 | 8 | 172 | 856 (76%) | 1630 (73%) | 1682 (75%) | | 🔴 Radeon RX 6900 XT | 23.04 | 16 | 512 | 1968 (59%) | 4227 (64%) | 4207 (63%) | | 🔴 Radeon RX 6800 XT | 20.74 | 16 | 512 | 2008 (60%) | 4241 (64%) | 4224 (64%) | | 🔴 Radeon PRO W6800 | 17.83 | 32 | 512 | 1620 (48%) | 3361 (51%) | 3180 (48%) | | 🔴 Radeon RX 6700 XT | 13.21 | 12 | 384 | 1408 (56%) | 2883 (58%) | 2908 (58%) | | 🔴 Radeon RX 6750 GRE 12GB | 12.49 | 12 | 384 | 1298 (52%) | 2840 (57%) | 2848 (57%) | | 🔴 Radeon RX 6800M | 11.78 | 12 | 384 | 1439 (57%) | 3190 (64%) | 3213 (64%) | | 🔴 Radeon RX 6700M | 10.60 | 10 | 320 | 1194 (57%) | 2388 (57%) | 2429 (58%) | | 🔴 Radeon RX 6600 | 8.93 | 8 | 224 | 963 (66%) | 1817 (62%) | 1839 (63%) | | 🔴 Radeon RX 6500 XT | 5.77 | 4 | 144 | 459 (49%) | 1011 (54%) | 1030 (55%) | | 🔴 Radeon RX 5700 XT | 9.75 | 8 | 448 | 1368 (47%) | 3253 (56%) | 3049 (52%) | | 🔴 Radeon RX 5700 | 7.72 | 8 | 448 | 1521 (52%) | 3167 (54%) | 2758 (47%) | | 🔴 Radeon RX 5600 XT | 6.73 | 6 | 288 | 1136 (60%) | 2214 (59%) | 2148 (57%) | | 🔴 Radeon RX Vega 64 | 13.35 | 8 | 484 | 1875 (59%) | 2878 (46%) | 3227 (51%) | | 🔴 Radeon RX 590 | 5.53 | 8 | 256 | 1257 (75%) | 1573 (47%) | 1688 (51%) | | 🔴 Radeon RX 580 4GB | 6.50 | 4 | 256 | 946 (57%) | 1848 (56%) | 1577 (47%) | | 🔴 Radeon RX 580 2048SP 8GB | 4.94 | 8 | 224 | 868 (59%) | 1622 (56%) | 1240 (43%) | | 🔴 Radeon RX 480 8GB | 5.83 | 8 | 256 | 1104 (66%) | 1908 (57%) | 1550 (47%) | | 🔴 Radeon Pro WX 5100 | 3.89 | 8 | 160 | 951 (91%) | 1604 (77%) | 927 (45%) | | 🔴 Radeon R9 Fury X | 8.60 | 4 | 512 | 1628 (49%) | 2880 (43%) | 2507 (38%) | | 🔴 Radeon R9 Nano | 8.19 | 4 | 512 | 1575 (47%) | 2761 (42%) | 2224 (33%) | | 🔴 Radeon R9 390X | 5.91 | 8 | 384 | 1733 (69%) | 2217 (44%) | 1722 (35%) | | 🔴 Radeon R9 290X | 5.63 | 4 | 320 | 1245 (60%) | 1699 (41%) | 1387 (33%) | | 🔴 Radeon R9 290 | 4.85 | 4 | 320 | 1136 (54%) | 1647 (40%) | 1272 (31%) | | 🔴 Radeon HD 7970 | 3.79 | 3 | 264 | 193 (11%) | 1270 (37%) | 1563 (46%) | | 🔴 Radeon HD 7870 | 2.56 | 2 | 154 | 105 (10%) | 716 (36%) | 868 (44%) | | 🔴 Radeon HD 7850 | 1.84 | 2 | 154 | 112 (11%) | 120 ( 6%) | 635 (32%) | | 🔴 Radeon HD 5870 | 2.72 | 1 | 154 | 257 (26%) | 111 ( 6%) | 141 ( 7%) | | 🔴 Radeon HD 5770 | 1.36 | 1 | 77 | 132 (26%) | 63 ( 6%) | 82 ( 8%) | | 🔴 FirePro V5800 | 1.10 | 1 | 64 | 108 (26%) | 52 ( 6%) | 66 ( 8%) | | 🔵 Arc B580 LE | 14.59 | 12 | 456 | 2598 (87%) | 4443 (75%) | 4979 (84%) | | 🔵 Arc A770 LE | 19.66 | 16 | 560 | 2663 (73%) | 4568 (63%) | 4519 (62%) | | 🔵 Arc A750 LE | 17.20 | 8 | 512 | 2555 (76%) | 4314 (65%) | 4047 (61%) | | 🔵 Arc A580 | 12.29 | 8 | 512 | 2534 (76%) | 3889 (58%) | 3488 (52%) | | 🔵 Arc Pro A40 | 5.02 | 6 | 192 | 594 (47%) | 985 (40%) | 927 (37%) | | 🔵 Arc A380 | 4.20 | 6 | 186 | 622 (51%) | 1097 (45%) | 1115 (46%) | | 🟢 RTX PRO 6000 Blackwell | 125.95 | 96 | 1792 | 10349 (88%) | 20511 (88%) | 20841 (90%) | | 🟢 GeForce RTX 5090 | 104.88 | 32 | 1792 | 9522 (81%) | 18459 (79%) | 19141 (82%) | | 🟢 GeForce RTX 5080 | 56.34 | 16 | 960 | 5174 (82%) | 10252 (82%) | 10304 (83%) | | 🟢 GeForce RTX 5070 | 30.84 | 12 | 672 | 3658 (83%) | 7238 (83%) | 7107 (81%) | | 🟢 GeForce RTX 4090 | 82.58 | 24 | 1008 | 5624 (85%) | 11091 (85%) | 11496 (88%) | | 🟢 RTX 6000 Ada | 91.10 | 48 | 960 | 4997 (80%) | 10249 (82%) | 10293 (83%) | | 🟢 L40S | 91.61 | 48 | 864 | 3788 (67%) | 7637 (68%) | 7617 (68%) | | 🟢 L40 | 90.52 | 48 | 864 | 3870 (69%) | 7778 (69%) | 7945 (71%) | | 🟢 GeForce RTX 4080 Super | 52.22 | 16 | 736 | 4089 (85%) | 7660 (80%) | 8218 (86%) | | 🟢 GeForce RTX 4080 | 55.45 | 16 | 717 | 3914 (84%) | 7626 (82%) | 7933 (85%) | | 🟢 GeForce RTX 4070 Ti Super | 44.10 | 16 | 672 | 3694 (84%) | 6435 (74%) | 7295 (84%) | | 🟢 GeForce RTX 4090M | 28.31 | 16 | 576 | 3367 (89%) | 6545 (87%) | 6901 (92%) | | 🟢 GeForce RTX 4070 Super | 35.55 | 12 | 504 | 2751 (83%) | 5149 (79%) | 5554 (85%) | | 🟢 GeForce RTX 4070 | 29.15 | 12 | 504 | 2646 (80%) | 4548 (69%) | 5016 (77%) | | 🟢 GeForce RTX 4080M | 33.85 | 12 | 432 | 2577 (91%) | 5086 (91%) | 5114 (91%) | | 🟢 RTX 4000 Ada | 26.73 | 20 | 360 | 2130 (91%) | 3964 (85%) | 4221 (90%) | | 🟢 L4 | 30.29 | 24 | 300 | 1490 (76%) | 2854 (73%) | 2857 (73%) | | 🟢 GeForce RTX 4060 | 15.11 | 8 | 272 | 1614 (91%) | 3052 (86%) | 3124 (88%) | | 🟢 GeForce RTX 4070M | 18.25 | 8 | 256 | 1553 (93%) | 2945 (89%) | 3092 (93%) | | 🟢 RTX 2000 Ada | 12.00 | 16 | 224 | 1351 (92%) | 2452 (84%) | 2526 (87%) | | 🟢 GeForce RTX 3090 Ti | 40.00 | 24 | 1008 | 5717 (87%) | 10956 (84%) | 10400 (79%) | | 🟢 GeForce RTX 3090 | 39.05 | 24 | 936 | 5418 (89%) | 10732 (88%) | 10215 (84%) | | 🟢 GeForce RTX 3080 Ti | 37.17 | 12 | 912 | 5202 (87%) | 9832 (87%) | 9347 (79%) | | 🟢 GeForce RTX 3080 12GB | 32.26 | 12 | 912 | 5071 (85%) | 9657 (81%) | 8615 (73%) | | 🟢 RTX A6000 | 40.00 | 48 | 768 | 4421 (88%) | 8814 (88%) | 8533 (86%) | | 🟢 RTX A5000 | 27.77 | 24 | 768 | 4414 (88%) | 8617 (86%) | 6971 (70%) | | 🟢 GeForce RTX 3080 10GB | 29.77 | 10 | 760 | 4230 (85%) | 8118 (82%) | 7714 (78%) | | 🟢 A40 | 37.42 | 48 | 696 | 3378 (74%) | 6605 (73%) | 6622 (73%) | | 🟢 GeForce RTX 3070 Ti | 21.75 | 8 | 608 | 3490 (88%) | 6807 (86%) | 5926 (75%) | | 🟢 A10 | 31.24 | 24 | 600 | 2931 (75%) | 5741 (74%) | 5597 (72%) | | 🟢 GeForce RTX 3080M Ti | 23.61 | 16 | 512 | 2985 (89%) | 5908 (89%) | 5780 (87%) | | 🟢 GeForce RTX 3070 | 20.31 | 8 | 448 | 2578 (88%) | 5096 (88%) | 5060 (87%) | | 🟢 GeForce RTX 3060 Ti | 16.49 | 8 | 448 | 2644 (90%) | 5129 (88%) | 4718 (81%) | | 🟢 RTX A4000 | 19.17 | 16 | 448 | 2500 (85%) | 4945 (85%) | 4664 (80%) | | 🟢 RTX A5000M | 16.59 | 16 | 448 | 2228 (76%) | 4461 (77%) | 3662 (63%) | | 🟢 GeForce RTX 3060 | 13.17 | 12 | 360 | 2108 (90%) | 4070 (87%) | 3566 (76%) | | 🟢 GeForce RTX 3060M | 10.94 | 6 | 336 | 2019 (92%) | 4012 (92%) | 3572 (82%) | | 🟢 A16 (1 GPU) | 4.49 | 16 | 200 | 1031 (79%) | 2031 (78%) | 1133 (44%) | | 🟢 A2 | 4.53 | 15 | 200 | 1031 (79%) | 2051 (79%) | 1199 (46%) | | 🟢 GeForce RTX 3050M Ti | 7.60 | 4 | 192 | 1181 (94%) | 2341 (94%) | 2253 (90%) | | 🟢 GeForce RTX 3050M | 7.13 | 4 | 192 | 1180 (94%) | 2339 (94%) | 2016 (81%) | | 🟢 GeForce RTX 3050 6GB | 6.77 | 6 | 168 | 993 (90%) | 1898 (87%) | 1879 (86%) | | 🟢 Titan RTX | 16.31 | 24 | 672 | 3471 (79%) | 7456 (85%) | 7554 (87%) | | 🟢 Quadro RTX 6000 | 16.31 | 24 | 672 | 3307 (75%) | 6836 (78%) | 6879 (79%) | | 🟢 Quadro RTX 8000 Passive | 14.93 | 48 | 624 | 2591 (64%) | 5408 (67%) | 5607 (69%) | | 🟢 GeForce RTX 2080 Ti | 13.45 | 11 | 616 | 3194 (79%) | 6700 (84%) | 6853 (86%) | | 🟢 GeForce RTX 2080 Super | 11.34 | 8 | 496 | 2434 (75%) | 5284 (82%) | 5087 (79%) | | 🟢 Quadro RTX 5000 | 11.15 | 16 | 448 | 2341 (80%) | 4766 (82%) | 4773 (82%) | | 🟢 GeForce RTX 2080 | 10.07 | 8 | 448 | 2318 (79%) | 4977 (86%) | 4963 (85%) | | 🟢 GeForce RTX 2070 Super | 9.22 | 8 | 448 | 2255 (77%) | 4866 (84%) | 4893 (84%) | | 🟢 GeForce RTX 2070 | 7.47 | 8 | 448 | 2444 (83%) | 4387 (75%) | 5017 (86%) | | 🟢 GeForce RTX 2060 Super | 7.18 | 8 | 448 | 2503 (85%) | 5035 (87%) | 4463 (77%) | | 🟢 Quadro RTX 4000 | 7.12 | 8 | 416 | 2284 (84%) | 4584 (85%) | 4062 (75%) | | 🟢 GeForce RTX 2060 KO | 6.74 | 6 | 336 | 1643 (75%) | 3376 (77%) | 3266 (75%) | | 🟢 GeForce RTX 2060 | 6.74 | 6 | 336 | 1681 (77%) | 3604 (83%) | 3571 (82%) | | 🟢 GeForce GTX 1660 Super | 5.03 | 6 | 336 | 1696 (77%) | 3551 (81%) | 3040 (70%) | | 🟢 Tesla T4 | 8.14 | 15 | 300 | 1356 (69%) | 2869 (74%) | 2887 (74%) | | 🟢 GeForce GTX 1660 Ti | 5.48 | 6 | 288 | 1467 (78%) | 3041 (81%) | 3019 (81%) | | 🟢 GeForce GTX 1660 | 5.07 | 6 | 192 | 1016 (81%) | 1924 (77%) | 1992 (80%) | | 🟢 GeForce GTX 1650M 896C | 2.72 | 4 | 192 | 963 (77%) | 1836 (74%) | 1858 (75%) | | 🟢 GeForce GTX 1650M 1024C | 3.20 | 4 | 128 | 706 (84%) | 1214 (73%) | 1400 (84%) | | 🟢 T500 | 3.04 | 4 | 80 | 339 (65%) | 578 (56%) | 665 (64%) | | 🟢 Titan Xp | 12.15 | 12 | 548 | 2919 (82%) | 5495 (77%) | 5375 (76%) | | 🟢 GeForce GTX 1080 Ti | 12.06 | 11 | 484 | 2631 (83%) | 4837 (77%) | 4877 (78%) | | 🟢 GeForce GTX 1080 | 9.78 | 8 | 320 | 1623 (78%) | 3100 (75%) | 3182 (77%) | | 🟢 GeForce GTX 1060 6GB | 4.57 | 6 | 192 | 997 (79%) | 1925 (77%) | 1785 (72%) | | 🟢 GeForce GTX 1060M | 4.44 | 6 | 192 | 983 (78%) | 1882 (75%) | 1803 (72%) | | 🟢 GeForce GTX 1050M Ti | 2.49 | 4 | 112 | 631 (86%) | 1224 (84%) | 1115 (77%) | | 🟢 Quadro P1000 | 1.89 | 4 | 82 | 426 (79%) | 839 (79%) | 778 (73%) | | 🟢 GeForce GTX TITAN X | 6.61 | 12 | 336 | 1506 (69%) | 2665 (61%) | 2518 (58%) | | 🟢 GeForce GTX 980 Ti | 6.05 | 6 | 336 | 1509 (69%) | 2703 (62%) | 2381 (55%) | | 🟢 GeForce GTX 980 | 4.98 | 4 | 224 | 1018 (70%) | 1965 (68%) | 1872 (64%) | | 🟢 GeForce GTX 970 | 4.17 | 4 | 224 | 980 (67%) | 1721 (59%) | 1623 (56%) | | 🟢 Quadro M4000 | 2.57 | 8 | 192 | 899 (72%) | 1519 (61%) | 1050 (42%) | | 🟢 Tesla M60 (1 GPU) | 4.82 | 8 | 160 | 853 (82%) | 1571 (76%) | 1557 (75%) | | 🟢 GeForce GTX 960M | 1.51 | 4 | 80 | 442 (84%) | 872 (84%) | 627 (60%) | | 🟢 GeForce GTX 780 Ti | 5.35 | 3 | 336 | 1710 (78%) | 2776 (64%) | 1302 (30%) | | 🟢 GeForce GTX 770 | 3.33 | 2 | 224 | 800 (55%) | 1215 (42%) | 876 (30%) | | 🟢 GeForce GTX 680 4GB | 3.33 | 4 | 192 | 783 (62%) | 1274 (51%) | 814 (33%) | | 🟢 GeForce GTX 670 | 2.81 | 2 | 192 | 747 (59%) | 1220 (49%) | 731 (29%) | | 🟢 GeForce GTX 660 | 1.98 | 2 | 144 | 505 (54%) | 1146 (61%) | 489 (26%) | | 🟢 GeForce GTX 660 OEM | 2.05 | 1.5 | 134 | 556 (63%) | 990 (57%) | 530 (30%) | | 🟢 Quadro K2000 | 0.73 | 2 | 64 | 312 (75%) | 444 (53%) | 171 (21%) | | 🟢 GeForce GT 630 (OEM) | 0.46 | 2 | 29 | 151 (81%) | 185 (50%) | 78 (21%) | | 🟢 GeForce GTX 580 | 1.58 | 1.5 | 192 | 1028 (82%) | 1481 (59%) | 599 (24%) | | 🟢 GeForce GTX 560 Ti | 1.26 | 1 | 128 | 661 (79%) | 895 (54%) | 415 (25%) | | 🟢 GeForce GTX 480 | 1.35 | 1.5 | 177 | 908 (78%) | 1264 (55%) | 509 (22%) | | 🟢 GeForce GTX 280 | 0.62 | 1 | 142 | 409 (44%) | 462 (25%) | 209 (11%) | | 🟢 GeForce GTX 260 | 0.48 | 0.9 | 111 | 345 (47%) | 421 (29%) | 160 (11%) | | 🟢 Quadro FX 5800 | 0.62 | 4 | 102 | 296 (44%) | 268 (20%) | 211 (16%) | | 🟢 Quadro NVS 290 | 0.03 | 0.25 | 6 | 9 (22%) | 4 ( 5%) | 4 ( 5%) | | 🟤 Arise 1020 | 1.50 | 2 | 19 | 6 ( 5%) | 6 ( 2%) | 6 ( 2%) | | | | | | | | | | ⚪ M2 Ultra GPU 76CU 192GB | 19.46 | 147 | 800 | 4629 (89%) | 8769 (84%) | 7972 (77%) | | ⚪ M2 Max GPU 38CU 32GB | 9.73 | 22 | 400 | 2405 (92%) | 4641 (89%) | 2444 (47%) | | ⚪ M2 Pro GPU 19CU 16GB | 4.86 | 11 | 200 | 1195 (91%) | 2374 (91%) | 2173 (84%) | | ⚪ M1 Ultra GPU 64CU 128GB | 16.38 | 98 | 800 | 4519 (86%) | 8418 (81%) | 6915 (67%) | | ⚪ M1 Max GPU 24CU 32GB | 6.14 | 22 | 400 | 2369 (91%) | 4496 (87%) | 2777 (53%) | | ⚪ M1 Pro GPU 16CU 16GB | 4.10 | 11 | 200 | 1204 (92%) | 2329 (90%) | 1855 (71%) | | ⚪ M1 GPU 8CU 16GB | 2.05 | 11 | 68 | 384 (86%) | 758 (85%) | 759 (86%) | | 🔴 Radeon 8060S (Max+ 395) | 29.70 | 15 | 256 | 1231 (74%) | 2541 (76%) | 2563 (77%) | | 🔴 Radeon 780M (Z1 Extreme) | 8.29 | 8 | 102 | 443 (66%) | 860 (65%) | 820 (62%) | | 🔴 Radeon Graphics (7800X3D) | 0.56 | 12 | 102 | 338 (51%) | 498 (37%) | 283 (21%) | | 🔴 Radeon Vega 8 (4750G) | 2.15 | 27 | 57 | 263 (71%) | 511 (70%) | 501 (68%) | | 🔴 Radeon Vega 8 (3500U) | 1.23 | 7 | 38 | 157 (63%) | 282 (57%) | 288 (58%) | | 🔵 Arc 140V GPU (16GB) | 3.99 | 16 | 137 | 636 (71%) | 1282 (72%) | 773 (44%) | | 🔵 Arc Graphics (Ultra 9 185H) | 4.81 | 14 | 90 | 271 (46%) | 710 (61%) | 724 (62%) | | 🔵 Iris Xe Graphics (i7-1265U) | 1.92 | 13 | 77 | 342 (68%) | 621 (62%) | 574 (58%) | | 🔵 UHD Graphics Xe 32EUs | 0.74 | 25 | 51 | 128 (38%) | 245 (37%) | 216 (32%) | | 🔵 UHD Graphics 770 | 0.82 | 30 | 90 | 342 (58%) | 475 (41%) | 278 (24%) | | 🔵 UHD Graphics 630 | 0.46 | 7 | 51 | 151 (45%) | 301 (45%) | 187 (28%) | | 🔵 UHD Graphics P630 | 0.46 | 51 | 42 | 177 (65%) | 288 (53%) | 137 (25%) | | 🔵 HD Graphics 5500 | 0.35 | 3 | 26 | 75 (45%) | 192 (58%) | 108 (32%) | | 🔵 HD Graphics 4600 | 0.38 | 2 | 26 | 105 (63%) | 115 (35%) | 34 (10%) | | 🟡 Mali-G610 MP4 (Orange Pi 5) | 0.06 | 16 | 34 | 130 (58%) | 232 (52%) | 93 (21%) | | 🟡 Mali-G72 MP18 (Samsung S9+) | 0.24 | 4 | 29 | 110 (59%) | 230 (62%) | 21 ( 6%) | | | | | | | | | | 🔴 2x EPYC 9754 | 50.79 | 3072 | 922 | 3276 (54%) | 5077 (42%) | 5179 (43%) | | 🔴 2x EPYC 9654 | 43.62 | 1536 | 922 | 3087 (51%) | 3675 (31%) | 4092 (34%) | | 🔴 2x EPYC 9554 | 30.72 | 384 | 922 | 2552 (42%) | 2127 (18%) | 2144 (18%) | | 🔴 1x EPYC 9124 | 3.69 | 128 | 307 | 772 (38%) | 579 (15%) | 586 (15%) | | 🔴 2x EPYC 7713 | 8.19 | 512 | 410 | 1298 (48%) | 492 ( 9%) | 1418 (27%) | | 🔴 2x EPYC 7352 | 3.53 | 512 | 410 | 739 (28%) | 106 ( 2%) | 412 ( 8%) | | 🔴 2x EPYC 7313 | 3.07 | 128 | 410 | 498 (19%) | 367 ( 7%) | 418 ( 8%) | | 🔴 2x EPYC 7302 | 3.07 | 128 | 410 | 784 (29%) | 336 ( 6%) | 411 ( 8%) | | 🔵 2x Xeon 6980P | 98.30 | 6144 | 1690 | 7875 (71%) | 5112 (23%) | 5610 (26%) | | 🔵 2x Xeon 6979P | 92.16 | 3072 | 1690 | 8135 (74%) | 4175 (19%) | 4622 (21%) | | 🔵 2x Xeon 6960P | 70.04 | 2304 | 1229 | 5477 (68%) | 3387 (21%) | 3986 (25%) | | 🔵 2x Xeon Platinum 8592+ | 31.13 | 1024 | 717 | 3135 (67%) | 2359 (25%) | 2466 (26%) | | 🔵 2x Xeon Gold 6548N | 22.94 | 2048 | 666 | 1811 (42%) | 1388 (16%) | 1425 (16%) | | 🔵 2x Xeon CPU Max 9480 | 27.24 | 256 | 614 | 2037 (51%) | 1520 (19%) | 1464 (18%) | | 🔵 2x Xeon Platinum 8480+ | 28.67 | 512 | 614 | 2162 (54%) | 1845 (23%) | 1884 (24%) | | 🔵 2x Xeon Platinum 8470 | 25.29 | 2048 | 614 | 1865 (46%) | 1909 (24%) | 2068 (26%) | | 🔵 2x Xeon Gold 6438Y+ | 16.38 | 1024 | 614 | 1945 (48%) | 1219 (15%) | 1257 (16%) | | 🔵 2x Xeon Platinum 8380 | 23.55 | 2048 | 410 | 1410 (53%) | 1159 (22%) | 1298 (24%) | | 🔵 2x Xeon Platinum 8358 | 21.30 | 256 | 410 | 1285 (48%) | 1007 (19%) | 1120 (21%) | | 🔵 2x Xeon Platinum 8256 | 3.89 | 1536 | 282 | 396 (22%) | 158 ( 4%) | 175 ( 5%) | | 🔵 2x Xeon Platinum 8153 | 8.19 | 384 | 256 | 691 (41%) | 290 ( 9%) | 328 (10%) | | 🔵 2x Xeon Gold 6248R | 18.43 | 384 | 282 | 755 (41%) | 566 (15%) | 694 (19%) | | 🔵 2x Xeon Gold 6128 | 5.22 | 192 | 256 | 254 (15%) | 185 ( 6%) | 193 ( 6%) | | 🔵 Xeon Phi 7210 | 5.32 | 192 | 102 | 415 (62%) | 193 (15%) | 223 (17%) | | 🔵 4x Xeon E5-4620 v4 | 2.69 | 512 | 273 | 460 (26%) | 275 ( 8%) | 239 ( 7%) | | 🔵 2x Xeon E5-2630 v4 | 1.41 | 64 | 137 | 264 (30%) | 146 ( 8%) | 129 ( 7%) | | 🔵 2x Xeon E5-2623 v4 | 0.67 | 64 | 137 | 125 (14%) | 66 ( 4%) | 59 ( 3%) | | 🔵 2x Xeon E5-2680 v3 | 1.92 | 128 | 137 | 304 (34%) | 234 (13%) | 291 (16%) | | 🟢 GH200 Neoverse-V2 CPU | 7.88 | 480 | 384 | 1323 (53%) | 853 (17%) | 683 (14%) | | 🔴 Threadripper PRO 7995WX | 15.36 | 256 | 333 | 1134 (52%) | 1697 (39%) | 1715 (40%) | | 🔴 Threadripper 3970X | 3.79 | 128 | 102 | 376 (56%) | 103 ( 8%) | 463 (35%) | | 🔴 Threadripper 1950X | 0.87 | 128 | 85 | 273 (49%) | 43 ( 4%) | 151 (14%) | | 🔴 Ryzen 9 7900X3D | 1.69 | 128 | 83 | 278 (51%) | 521 (48%) | 462 (43%) | | 🔴 Ryzen 7 7800X3D | 1.08 | 32 | 102 | 296 (44%) | 361 (27%) | 363 (27%) | | 🔴 Ryzen 7 5700X3D | 0.87 | 32 | 51 | 229 (68%) | 135 (20%) | 173 (26%) | | 🔴 FX-6100 | 0.16 | 16 | 26 | 11 ( 7%) | 11 ( 3%) | 22 ( 7%) | | 🔴 Athlon X2 QL-65 | 0.03 | 4 | 11 | 3 ( 4%) | 2 ( 2%) | 3 ( 2%) | | 🔵 Core Ultra 7 258V | 0.56 | 32 | 137 | 287 (32%) | 123 ( 7%) | 167 ( 9%) | | 🔵 Core Ultra 9 185H | 1.79 | 16 | 90 | 317 (54%) | 267 (23%) | 288 (25%) | | 🔵 Core i9-14900K | 3.74 | 32 | 96 | 443 (71%) | 453 (36%) | 490 (39%) | | 🔵 Core i7-13700K | 2.51 | 64 | 90 | 504 (86%) | 398 (34%) | 424 (36%) | | 🔵 Core i7-1265U | 1.23 | 32 | 77 | 128 (26%) | 62 ( 6%) | 58 ( 6%) | | 🔵 Core i9-11900KB | 0.84 | 32 | 51 | 109 (33%) | 195 (29%) | 208 (31%) | | 🔵 Core i9-10980XE | 3.23 | 128 | 94 | 286 (47%) | 251 (21%) | 223 (18%) | | 🔵 Xeon E-2288G | 0.95 | 32 | 43 | 196 (70%) | 182 (33%) | 198 (36%) | | 🔵 Core i7-9700 | 0.77 | 64 | 43 | 103 (37%) | 62 (11%) | 95 (17%) | | 🔵 Core i5-9600 | 0.60 | 16 | 43 | 146 (52%) | 127 (23%) | 147 (27%) | | 🔵 Core i7-8700K | 0.71 | 16 | 51 | 152 (45%) | 134 (20%) | 116 (17%) | | 🔵 Xeon E-2176G | 0.71 | 64 | 42 | 201 (74%) | 136 (25%) | 148 (27%) | | 🔵 Core i7-7700HQ | 0.36 | 12 | 38 | 81 (32%) | 82 (16%) | 108 (22%) | | 🔵 Xeon E3-1240 v5 | 0.50 | 32 | 34 | 141 (63%) | 75 (17%) | 88 (20%) | | 🔵 Core i7-4770 | 0.44 | 16 | 26 | 104 (62%) | 69 (21%) | 59 (18%) | | 🔵 Core i7-4720HQ | 0.33 | 16 | 26 | 80 (48%) | 23 ( 7%) | 60 (18%) | | 🔵 Celeron N2807 | 0.01 | 4 | 11 | 7 (10%) | 3 ( 2%) | 3 ( 2%) |

Multi-GPU Benchmarks

Multi-GPU benchmarks are done at the largest possible grid resolution with cubic domains, and either 2x1x1, 2x2x1 or 2x2x2 of these domains together. The (percentages in round brackets) are single-GPU roofline model efficiency, and the (multiplicators in round brackets) are scaling factors relative to benchmarked single-GPU performance.

```mermaid gantt

title FluidX3D Performance [MLUPs/s] - FP32 arithmetic, (fastest of FP32/FP16S/FP16C) memory storage dateFormat X axisFormat %s %%{ init: { "gantt": { 'titleTopMargin': 42, 'topPadding': 70, 'leftPadding': 260, 'rightPadding': 5, 'sectionFontSize': 20, 'fontSize': 20, 'barHeight': 20, 'barGap': 3, 'numberSectionStyles': 2 }, 'theme': 'forest', 'themeVariables': { 'sectionBkgColor': '#99999999', 'altSectionBkgColor': '#00000000', 'titleColor': '#AFAFAF', 'textColor': '#AFAFAF', 'taskTextColor': 'black', 'taskBorderColor': '#487E3A' } } }%%

section 8x MI300X 204924 :crit, 0, 204924 section 4x MI300X 109546 :crit, 0, 109546 section 2x MI300X 61053 :crit, 0, 61053 section 1x MI300X 41327 :crit, 0, 41327

section 4x MI250 (8 GCD) 53521 :crit, 0, 53521 section 2x MI250 (4 GCD) 29627 :crit, 0, 29627 section 1x MI250 (2 GCD 17338 :crit, 0, 17338 section 1x MI250 (1 GCD) 9030 :crit, 0, 9030

section 32x MI210 GigaIO 50952 :crit, 0, 50952 section 24x MI210 GigaIO 45033 :crit, 0, 45033 section 16x MI210 GigaIO 37922 :crit, 0, 37922 section 8x MI210 GigaIO 27996 :crit, 0, 27996 section 4x MI210 GigaIO 17232 :crit, 0, 17232 section 2x MI210 GigaIO 13539 :crit, 0, 13539 section 1x MI210 GigaIO 9105 :crit, 0, 9105

section 4x MI210 31408 :crit, 0, 31408 section 2x MI210 16156 :crit, 0, 16156 section 1x MI210 8757 :crit, 0, 8757

section 3x MI50 + 1x A100 40GB 22759 :active,crit, 0, 22759 section 3x MI50 32GB 21693 :crit, 0, 21693 section 2x MI50 32GB 14484 :crit, 0, 14484 section 1x MI50 32GB 8477 :crit, 0, 8477

section 8x Radeon VII 30826 :crit, 0, 30826 section 4x Radeon VII 24273 :crit, 0, 24273 section 2x Radeon VII 15591 :crit, 0, 15591 section 1x Radeon VII 7778 :crit, 0, 7778

section 1x Radeon Pro Duo (2 GPUs) 3310 :crit, 0, 3310 section 1x R9 295X2 (2 GPUs) 2428 :crit, 0, 2428 section 1x HD 7990 (2 GPUs) 2314 :crit, 0, 2314 section 1x HD 6990 (2 GPUs) 344 :crit, 0, 344 section 1x HD 5970 (2 GPUs) 360 :crit, 0, 360

section 4x GPU Max 1100 22777 :done, 0, 22777 section 2x GPU Max 1100 11815 :done, 0, 11815 section 1x GPU Max 1100 6209 :done, 0, 6209

section 8x B200 SXM6 180GB 219300 : 0, 219300 section 4x B200 SXM6 180GB 147446 : 0, 147446 section 2x B200 SXM6 180GB 85077 : 0, 85077 section 1x B200 SXM6 180GB 55609 : 0, 55609

section 8x H200 SXM5 141GB 157743 : 0, 157743 section 4x H200 SXM5 141GB 96056 : 0, 96056 section 2x H200 SXM5 141GB 57070 : 0, 57070 section 1x H200 SXM5 141GB 36610 : 0, 36610

section 4x H100 NVL 82122 : 0, 82122 section 2x H100 NVL 49958 : 0, 49958 section 1x H100 NVL 32922 : 0, 32922

section 4x H100 SXM5 80GB HBM3 78462 : 0, 78462 section 2x H100 SXM5 80GB HBM3 46189 : 0, 46189 section 1x H100 SXM5 80GB HBM3 28522 : 0, 28522

section 4x A100 PCIe 80GB 52056 : 0, 52056 section 2x A100 PCIe 80GB 27165 : 0, 27165 section 1x A100 PCIe 80GB 17896 : 0, 17896

section 4x PG506-243/242 41088 : 0, 41088 section 2x PG506-243/242 24168 : 0, 24168 section 1x PG506-243/242 15654 : 0, 15654

section 8x A100 SXM4 40GB 72965 : 0, 72965 section 4x A100 SXM4 40GB 42400 : 0, 42400 section 2x A100 SXM4 40GB 23707 : 0, 23707 section 1x A100 SXM4 40GB 15917 : 0, 15917

section 4x V100 SXM2 32GB 26527 : 0, 26527 section 2x V100 SXM2 32GB 15469 : 0, 15469 section 1x V100 SXM2 32GB 8947 : 0, 8947

section 3x K40m + 1x Titan Xp 5174 : 0, 5174 section 2x Tesla K40m 3300 : 0, 3300 section 1x Tesla K40m 1868 : 0, 1868

section 1x Tesla K80 (2 GPUs) 3448 : 0, 3448 section 1x Tesla K80 (1 GPU) 1642 : 0, 1642

section 2x L40S 13640 : 0, 13640 section 1x L40S 7669 : 0, 7669

section 2x L40 14164 : 0, 14164 section 1x L40 7945 : 0, 7945

section 8x RTX A6000 40063 : 0, 40063 section 4x RTX A6000 27915 : 0, 27915 section 2x RTX A6000 15026 : 0, 15026 section 1x RTX A6000 8814 : 0, 8814

section 4x A16 (16 GPUs) 22451 : 0, 22451 section 2x A16 (8 GPUs) 11777 : 0, 11777 section 1x A16 (4 GPUs) 6348 : 0, 6348 section 1x A16 (2 GPUs) 3475 : 0, 3475 section 1x A16 (1 GPU) 2031 : 0, 2031

section 2x A2 3539 : 0, 3539 section 1x A2 2051 : 0, 2051

section 2x Quadro RTX 8000 Pa. 10214 : 0, 10214 section 1x Quadro RTX 8000 Pa. 5607 : 0, 5607

section 7x 2080 Ti + 1x A100 40GB 33857 : 0, 33857 section 4x GeForce RTX 2080 Ti 18598 : 0, 18598 section 2x GeForce RTX 2080 Ti 10922 : 0, 10922 section 1x GeForce RTX 2080 Ti 6853 : 0, 6853

section 1x GTX 690 (2 GPUs) 920 : 0, 920

section 2x Arc A770 8745 :done, 0, 8745 section 1x Arc A770 4568 :done, 0, 4568

section 1x A100 + 1x P100 + 2x A2 + 3x MI50 + 1x A770 17296 :active,crit, 0, 17296 section 1x 7700 XT + 1x B580 + 1x Titan Xp 8358 :active,crit, 0, 8358 section 1x A770 + 1x Titan Xp 8380 :active,done, 0, 8380 ```

Multi-GPU Benchmark Table

Colors: 🔴 AMD, 🔵 Intel, 🟢 Nvidia, ⚪ Apple, 🟡 ARM, 🟤 Glenfly | Device | FP32
[TFlops/s] | Mem
[GB] | BW
[GB/s] | FP32/FP32
[MLUPs/s] | FP32/FP16S
[MLUPs/s] | FP32/FP16C
[MLUPs/s] | | :-------------------------------------------------------------- | -----------------: | ----------: | -----------: | ---------------------: | ----------------------: | ----------------------: | | | | | | | | | | 🔴 8x Instinct MI300X | 1307.20 | 1536 | 42400 | 152835 (6.7x) | 192297 (4.7x) | 204924 (6.5x) | | 🔴 4x Instinct MI300X | 653.60 | 768 | 21200 | 83678 (3.7x) | 103200 (2.5x) | 109546 (3.5x) | | 🔴 2x Instinct MI300X | 326.80 | 384 | 10600 | 46673 (2.0x) | 61053 (1.5x) | 57391 (1.8x) | | 🔴 1x Instinct MI300X | 163.40 | 192 | 5300 | 22867 (66%) | 41327 (60%) | 31670 (46%) | | | | | | | | | | 🔴 4x Instinct MI250 (8 GCD) | 362.08 | 512 | 13107 | 27350 (4.9x) | 52258 (5.8x) | 53521 (6.3x) | | 🔴 2x Instinct MI250 (4 GCD) | 181.04 | 256 | 6554 | 16925 (3.0x) | 29163 (3.2x) | 29627 (3.5x) | | 🔴 1x Instinct MI250 (2 GCD) | 90.52 | 128 | 3277 | 9460 (1.7x) | 14313 (1.6x) | 17338 (2.0x) | | 🔴 1x Instinct MI250 (1 GCD) | 45.26 | 64 | 1638 | 5638 (53%) | 9030 (42%) | 8506 (40%) | | | | | | | | | | 🔴 32x Instinct MI210 GigaIO | 1448.32 | 2048 | 52429 | 23881 (3.8x) | 50952 (6.0x) | 48848 (5.4x) | | 🔴 24x Instinct MI210 GigaIO | 1086.24 | 1536 | 39322 | 22056 (3.5x) | 45033 (5.3x) | 44631 (4.9x) | | 🔴 16x Instinct MI210 GigaIO | 724.16 | 1024 | 26214 | 18094 (2.9x) | 37360 (4.4x) | 37922 (4.2x) | | 🔴 8x Instinct MI210 GigaIO | 362.08 | 512 | 13107 | 13546 (2.1x) | 27996 (3.3x) | 27820 (3.1x) | | 🔴 4x Instinct MI210 GigaIO | 181.04 | 256 | 6554 | 8816 (1.4x) | 17232 (2.0x) | 16892 (1.9x) | | 🔴 2x Instinct MI210 GigaIO | 90.52 | 128 | 3277 | 7245 (1.1x) | 12050 (1.4x) | 13539 (1.5x) | | 🔴 1x Instinct MI210 GigaIO | 45.26 | 64 | 1638 | 6347 (59%) | 8486 (40%) | 9105 (43%) | | | | | | | | | | 🔴 4x Instinct MI210 | 181.04 | 256 | 6554 | 17075 (2.6x) | 31408 (3.6x) | 30643 (3.5x) | | 🔴 2x Instinct MI210 | 90.52 | 128 | 3277 | 9624 (1.5x) | 15909 (1.8x) | 16156 (1.8x) | | 🔴 1x Instinct MI210 | 45.26 | 64 | 1638 | 6454 (60%) | 8757 (41%) | 8751 (41%) | | | | | | | | | | 🔴 3x MI50 32GB + 🟢 1x A100 40GB | 52.99 | 128 | 4096 | 13159 (3.0x) | 22759 (2.7x) | 11953 (2.7x) | | 🔴 3x Instinct MI50 32GB | 39.74 | 96 | 3072 | 11709 (2.6x) | 21693 (2.6x) | 9969 (2.3x) | | 🔴 2x Instinct MI50 32GB | 26.50 | 64 | 2048 | 7803 (1.8x) | 14484 (1.7x) | 6647 (1.5x) | | 🔴 1x Instinct MI50 32GB | 13.25 | 32 | 1024 | 4446 (66%) | 8477 (64%) | 4406 (33%) | | | | | | | | | | 🔴 8x Radeon VII | 110.64 | 128 | 8192 | 21946 (4.5x) | 30826 (4.0x) | 24572 (4.7x) | | 🔴 4x Radeon VII | 55.32 | 64 | 4096 | 12911 (2.6x) | 24273 (3.1x) | 17080 (3.2x) | | 🔴 2x Radeon VII | 27.66 | 32 | 2048 | 8113 (1.7x) | 15591 (2.0x) | 10352 (2.0x) | | 🔴 1x Radeon VII | 13.83 | 16 | 1024 | 4898 (73%) | 7778 (58%) | 5256 (40%) | | | | | | | | | | 🔴 1x Radeon Pro Duo (2 GPUs) | 16.38 | 8 | 1024 | 1945 (29%) | 3310 (25%) | 2879 (22%) | | 🔴 1x R9 295X2 (2 GPUs) | 11.47 | 8 | 640 | 1727 (41%) | 2428 (29%) | 1989 (24%) | | 🔴 1x HD 7990 (2 GPUs) | 8.19 | 6 | 576 | 1552 (41%) | 1996 (27%) | 2314 (31%) | | 🔴 1x HD 6990 (2 GPUs) | 5.41 | 4 | 320 | 344 (16%) | 214 ( 5%) | 255 ( 6%) | | 🔴 1x HD 5970 (2 GPUs) | 3.76 | 2 | 256 | 360 (22%) | 181 ( 5%) | 224 ( 7%) | | | | | | | | | | 🔵 4x DC GPU Max 1100 | 88.88 | 192 | 4915 | 12162 (3.5x) | 22777 (3.7x) | 11759 (3.6x) | | 🔵 2x DC GPU Max 1100 | 44.44 | 96 | 2458 | 6301 (1.8x) | 11815 (1.9x) | 5970 (1.8x) | | 🔵 1x DC GPU Max 1100 | 22.22 | 48 | 1229 | 3487 (43%) | 6209 (39%) | 3252 (20%) | | | | | | | | | | 🟢 8x B200 SXM6 180GB | 595.60 | 1440 | 64000 | 132105 (3.1x) | 219300 (3.9x) | 128668 (5.7x) | | 🟢 4x B200 SXM6 180GB | 297.80 | 720 | 32000 | 94073 (2.2x) | 147446 (2.7x) | 72670 (3.2x) | | 🟢 2x B200 SXM6 180GB | 148.90 | 360 | 16000 | 62046 (1.5x) | 85077 (1.5x) | 39275 (1.7x) | | 🟢 1x B200 SXM6 180GB | 74.45 | 180 | 8000 | 42152 (81%) | 55609 (54%) | 22695 (22%) | | | | | | | | | | 🟢 8x H200 SXM5 141GB | 535.28 | 1128 | 38400 | 92008 (4.0x) | 157743 (4.3x) | 113490 (5.6x) | | 🟢 4x H200 SXM5 141GB | 267.64 | 564 | 19200 | 57310 (2.5x) | 96056 (2.6x) | 63117 (3.1x) | | 🟢 2x H200 SXM5 141GB | 133.82 | 282 | 9600 | 36711 (1.6x) | 57070 (1.6x) | 34871 (1.7x) | | 🟢 1x H200 SXM5 141GB | 66.91 | 141 | 4800 | 23056 (73%) | 36610 (59%) | 20291 (33%) | | | | | | | | | | 🟢 4x H100 NVL | 241.28 | 376 | 15752 | 44284 (2.2x) | 82122 (2.5x) | 53855 (2.9x) | | 🟢 2x H100 NVL | 120.64 | 188 | 7876 | 29050 (1.4x) | 49958 (1.5x) | 30586 (1.7x) | | 🟢 1x H100 NVL | 60.32 | 94 | 3938 | 20303 (79%) | 32922 (64%) | 18424 (36%) | | | | | | | | | | 🟢 4x H100 SXM5 80GB HBM3 | 267.63 | 320 | 13400 | 46442 (2.7x) | 78462 (2.8x) | 60490 (3.0x) | | 🟢 2x H100 SXM5 80GB HBM3 | 133.82 | 160 | 6700 | 26838 (1.6x) | 46189 (1.6x) | 34147 (1.7x) | | 🟢 1x H100 SXM5 80GB HBM3 | 66.91 | 80 | 3350 | 17262 (79%) | 28522 (66%) | 20065 (46%) | | | | | | | | | | 🟢 4x A100 PCIe 80GB | 77.96 | 320 | 7740 | 25957 (2.7x) | 52056 (2.9x) | 33283 (3.1x) | | 🟢 2x A100 PCIe 80GB | 38.98 | 160 | 3870 | 15742 (1.6x) | 27165 (1.5x) | 17510 (1.6x) | | 🟢 1x A100 PCIe 80GB | 19.49 | 80 | 1935 | 9657 (76%) | 17896 (71%) | 10817 (43%) | | | | | | | | | | 🟢 4x PG506-243 / PG506-242 | 88.57 | 256 | 6554 | 23097 (2.8x) | 41088 (2.6x) | 36130 (2.9x) | | 🟢 2x PG506-243 / PG506-242 | 44.28 | 128 | 3277 | 13885 (1.7x) | 24168 (1.5x) | 20906 (1.7x) | | 🟢 1x PG506-243 / PG506-242 | 22.14 | 64 | 1638 | 8195 (77%) | 15654 (74%) | 12271 (58%) | | | | | | | | | | 🟢 8x A100 SXM4 40GB | 155.92 | 320 | 12440 | 37619 (4.4x) | 72965 (4.6x) | 63009 (7.2x) | | 🟢 4x A100 SXM4 40GB | 77.96 | 160 | 6220 | 23411 (2.7x) | 42400 (2.7x) | 29017 (3.3x) | | 🟢 2x A100 SXM4 40GB | 38.98 | 80 | 3110 | 14311 (1.7x) | 23707 (1.5x) | 15512 (1.8x) | | 🟢 1x A100 SXM4 40GB | 19.49 | 40 | 1555 | 8543 (84%) | 15917 (79%) | 8748 (43%) | | | | | | | | | | 🟢 4x Tesla V100 SXM2 32GB | 62.68 | 128 | 3600 | 13135 (2.9x) | 26527 (3.0x) | 22686 (3.1x) | | 🟢 2x Tesla V100 SXM2 32GB | 31.34 | 64 | 1800 | 7953 (1.8x) | 15469 (1.7x) | 12932 (1.8x) | | 🟢 1x Tesla V100 SXM2 32GB | 15.67 | 32 | 900 | 4471 (76%) | 8947 (77%) | 7217 (62%) | | | | | | | | | | 🟢 3x K40m + 1x Titan Xp | 17.16 | 48 | 1154 | 3117 (2.8x) | 5174 (2.8x) | 3127 (3.4x) | | 🟢 2x Tesla K40m | 8.58 | 24 | 577 | 1971 (1.7x) | 3300 (1.8x) | 1801 (2.0x) | | 🟢 1x Tesla K40m | 4.29 | 12 | 288 | 1131 (60%) | 1868 (50%) | 912 (24%) | | | | | | | | | | 🟢 1x Tesla K80 (2 GPUs) | 8.22 | 24 | 480 | 2086 (2.3x) | 3448 (2.1x) | 2174 (2.3x) | | 🟢 1x Tesla K80 (1 GPU) | 4.11 | 12 | 240 | 916 (58%) | 1642 (53%) | 943 (30%) | | | | | | | | | | 🟢 2x L40S | 183.22 | 96 | 1728 | 6888 (1.8x) | 13099 (1.8x) | 13640 (1.8x) | | 🟢 1x L40S | 91.61 | 48 | 864 | 3824 (68%) | 7463 (67%) | 7669 (68%) | | | | | | | | | | 🟢 2x L40 | 181.04 | 96 | 1728 | 7137 (1.8x) | 13547 (1.7x) | 14164 (1.8x) | | 🟢 1x L40 | 90.52 | 48 | 864 | 3870 (69%) | 7778 (69%) | 7945 (71%) | | | | | | | | | | 🟢 8x RTX A6000 | 320.00 | 384 | 6144 | 19311 (4.4x) | 40063 (4.5x) | 39004 (4.6x) | | 🟢 4x RTX A6000 | 160.00 | 192 | 3072 | 14314 (3.2x) | 27915 (3.2x) | 27227 (3.2x) | | 🟢 2x RTX A6000 | 80.00 | 96 | 1536 | 8041 (1.8x) | 15026 (1.7x) | 14795 (1.7x) | | 🟢 1x RTX A6000 | 40.00 | 48 | 768 | 4421 (88%) | 8814 (88%) | 8533 (86%) | | | | | | | | | | 🟢 4x A16 (16 GPUs) | 71.84 | 256 | 3200 | 12078 (11.7x) | 22451 (11.1x) | 14489 (12.8x) | | 🟢 2x A16 (8 GPUs) | 35.92 | 128 | 1600 | 6389 (6.2x) | 11777 (5.8x) | 7500 (6.6x) | | 🟢 1x A16 (4 GPUs) | 17.96 | 64 | 800 | 3497 (3.4x) | 6348 (3.1x) | 3919 (3.5x) | | 🟢 1x A16 (2 GPUs) | 8.98 | 32 | 400 | 1912 (1.9x) | 3475 (1.7x) | 2064 (1.8x) | | 🟢 1x A16 (1 GPU) | 4.49 | 16 | 200 | 1031 (79%) | 2031 (78%) | 1133 (44%) | | | | | | | | | | 🟢 2x A2 | 9.06 | 30 | 400 | 1927 (1.9x) | 3539 (1.7x) | 2232 (1.9x) | | 🟢 1x A2 | 4.53 | 15 | 200 | 1031 (79%) | 2051 (79%) | 1199 (46%) | | | | | | | | | | 🟢 2x Quadro RTX 8000 Pa. | 29.86 | 96 | 1248 | 4767 (1.8x) | 9607 (1.8x) | 10214 (1.8x) | | 🟢 1x Quadro RTX 8000 Pa. | 14.93 | 48 | 624 | 2591 (64%) | 5408 (67%) | 5607 (69%) | | | | | | | | | | 🟢 7x 2080 Ti + 1x A100 40GB | 107.60 | 88 | 4928 | 16146 (5.1x) | 33732 (5.0x) | 33857 (4.9x) | | 🟢 4x GeForce RTX 2080 Ti | 53.80 | 44 | 2464 | 9117 (2.9x) | 18415 (2.7x) | 18598 (2.7x) | | 🟢 2x GeForce RTX 2080 Ti | 26.90 | 22 | 1232 | 5085 (1.6x) | 10770 (1.6x) | 10922 (1.6x) | | 🟢 1x GeForce RTX 2080 Ti | 13.45 | 11 | 616 | 3194 (79%) | 6700 (84%) | 6853 (86%) | | | | | | | | | | 🟢 1x GTX 690 (2 GPUs) | 6.26 | 4 | 385 | 587 (23%) | 920 (18%) | 535 (11%) | | | | | | | | | | 🔵 2x Arc A770 | 39.32 | 32 | 1120 | 4954 (1.9x) | 8745 (1.9x) | 8329 (1.8x) | | 🔵 1x Arc A770 | 19.66 | 16 | 560 | 2663 (73&) | 4568 (63%) | 4519 (62%) | | | | | | | | | | 🟢 1x A100 40GB + 🟢 1x P100 16GB + 🟢 2x A2 + 🔴 3x MI50 32GB + 🔵 1x A770 16GB | 54.36 | 180 | 2400 | 9903 (63%) | 17296 (55%) | 12041 (39%) | | 🔴 1x 7700 XT + 🔵 1x B580 + 🟢 1x Titan Xp | 36.45 | 36 | 1296 | 3979 (47%) | 8003 (48%) | 8358 (39%) | | 🔵 1x A770 + 🟢 1x Titan Xp | 24.30 | 24 | 1095 | 4717 (66%) | 8380 (59%) | 8026 (56%) |

FAQs

General

How to learn using FluidX3D?

Follow the FluidX3D Documentation!
What physical model does FluidX3D use?

FluidX3D implements the lattice Boltzmann method, a type of direct numerical simulation (DNS), the most accurate type of fluid simulation, but also the most computationally challenging. Optional extension models include volume force (Guo forcing), free surface (volume-of-fluid and PLIC), a temperature model and Smagorinsky-Lilly subgrid turbulence model.
FluidX3D only uses FP32 or even FP32/FP16, in contrast to FP64. Are simulation results physically accurate?

Yes, in all but extreme edge cases. The code has been specially optimized to minimize arithmetic round-off errors and make the most out of lower precision. With these optimizations, accuracy in most cases is indistinguishable from FP64 double-precision, even with FP32/FP16 mixed-precision. Details can be found in this paper.
Compared to the benchmark numbers stated here, efficiency seems much lower but performance is slightly better for most devices. How can this be?

In that paper, the One-Step-Pull swap algorithm is implemented, using only misaligned reads and coalesced writes. On almost all GPUs, the performance penalty for misaligned writes is much larger than for misaligned reads, and sometimes there is almost no penalty for misaligned reads at all. Because of this, One-Step-Pull runs at peak bandwidth and thus peak efficiency.
Here, a different swap algorithm termed Esoteric-Pull is used, a type of in-place streaming. This makes the LBM require much less memory (93 vs. 169 (FP32/FP32) or 55 vs. 93 (FP32/FP16) Bytes/cell for D3Q19), and also less memory bandwidth (153 vs. 171 (FP32/FP32) or 77 vs. 95 (FP32/FP16) Bytes/cell per time step for D3Q19) due to so-called implicit bounce-back boundaries. However memory access now is half coalesced and half misaligned for both reads and writes, so memory access efficiency is lower. For overall performance, these two effects approximately cancel out. The benefit of Esoteric-Pull - being able to simulate domains twice as large with the same amount of memory - clearly outweights the cost of slightly lower memory access efficiency, especially since performance is not reduced overall.
Why don't you use CUDA? Wouldn't that be more efficient?

No, that is a wrong myth. OpenCL is exactly as efficient as CUDA on Nvidia GPUs if optimized properly. Here I did roofline model and analyzed OpenCL performance on various hardware. OpenCL efficiency on modern Nvidia GPUs can be 100% with the right memory access pattern, so CUDA can't possibly be any more efficient. Without any performance advantage, there is no reason to use proprietary CUDA over OpenCL, since OpenCL is compatible with a lot more hardware.
Why no multi-relaxation-time (MRT) collision operator?

The idea of MRT is to linearly transform the DDFs into "moment space" by matrix multiplication and relax these moments individually, promising better stability and accuracy. In practice, in the vast majority of cases, it has zero or even negative effects on stability and accuracy, and simple SRT is much superior. Apart from the kinematic shear viscosity and conserved terms, the remaining moments are non-physical quantities and their tuning is a blackbox. Although MRT can be implemented in an efficient manner with only a single matrix-vector multiplication in registers, leading to identical performance compared to SRT by remaining bandwidth-bound, storing the matrices vastly elongates and over-complicates the code for no real benefit.

Hardware

Can FluidX3D run on multiple GPUs at the same time?

Yes. The simulation grid is then split in domains, one for each GPU (domain decomposition method). The GPUs essentially pool their memory, enabling much larger grid resolution and higher performance. Rendering is parallelized across multiple GPUs as well; each GPU renders its own domain with a 3D offset, then rendered frames from all GPUs are overlayed with their z-buffers. Communication between domains is done over PCIe, so no SLI/Crossfire/NVLink/InfinityFabric is required. All GPUs must however be installed in the same node (PC/laptop/server). Even unholy combinations of AMD+Intel+Nvidia GPUs will work, although it is recommended to only use GPUs with similar memory capacity and bandwidth together. Using a fast gaming GPU and slow integrated GPU together would only decrease performance due to communication overhead.
I'm on a budget and have only a cheap computer. Can I run FluidX3D on my toaster PC/laptop?

Absolutely. Today even the most inexpensive hardware, like integrated GPUs or entry-level gaming GPUs, support OpenCL. You might be a bit more limited on memory capacity and grid resolution, but you should be good to go. I've tested FluidX3D on very old and inexpensive hardware and even on my Samsung S9+ smartphone, and it runs just fine, although admittedly a bit slower.
I don't have an expensive workstation GPU, but only a gaming GPU. Will performance suffer?

No. Efficiency on gaming GPUs is exactly as good as on their "professional"/workstation counterparts. Performance often is even better as gaming GPUs have higher boost clocks.
Do I need a GPU with ECC memory?

No. Gaming GPUs work just fine. Some Nvidia GPUs automatically reduce memory clocks for compute applications to almost entirely eliminate memory errors.
My GPU does not support CUDA. Can I still use FluidX3D?

Yes. FluidX3D uses OpenCL 1.2 and not CUDA, so it runs on any GPU from any vendor since around 2012.
I don't have a dedicated graphics card at all. Can I still run FluidX3D on my PC/laptop?

Yes. FluidX3D also runs on all integrated GPUs since around 2012, and also on CPUs.
I need more memory than my GPU can offer. Can I run FluidX3D on my CPU as well?

Yes. You only need to install the Intel OpenCL CPU Runtime.
In the benchmarks you list some very expensive hardware. How do you get access to that?

As a PhD candidate in computational physics, I used FluidX3D for my research, so I had access to BZHPC, SuperMUC-NG and JSC JURECA-DC supercomputers.

Graphics

I don't have an RTX/DXR GPU that supports raytracing. Can I still use raytracing graphics in FluidX3D?

Yes, and at full performance. FluidX3D does not use a bounding volume hierarchy (BVH) to accelerate raytracing, but fast ray-grid traversal instead, implemented directly in OpenCL C. This is much faster than BVH for moving isosurfaces in the LBM grid (~N vs. ~N²+log(N) runtime; LBM itself is ~N³), and it does not require any dedicated raytracing hardware. Raytracing in FluidX3D runs on any GPU that supports OpenCL 1.2.
I have a datacenter/mining GPU without any video output or graphics hardware. Can FluidX3D still render simulation results?

Yes. FluidX3D does all rendering (rasterization and raytracing) in OpenCL C, so no display output and no graphics features like OpenGL/Vulkan/DirectX are required. Rendering is just another form of compute after all. Rendered frames are passed to the CPU over PCIe and then the CPU can either draw them on screen through dedicated/integrated graphics or write them to the hard drive.
I'm running FluidX3D on a remote (super-)computer and only have an SSH terminal. Can I still use graphics somehow?

Yes, either directly as interactive ASCII graphics in the terminal or by storing rendered frames on the hard drive and then copying them over via `scp -r user@server.url:"~/path/to/images/folder" .`.

Licensing

I want to learn about programming/software/physics/engineering. Can I use FluidX3D for free?

Yes. Anyone can use FluidX3D for free for public research, education or personal use. Use by scientists, students and hobbyists is free of charge and well encouraged.
I am a scientist/teacher with a paid position at a public institution. Can I use FluidX3D for my research/teaching?

Yes, you can use FluidX3D free of charge. This is considered research/education, not commercial use. To give credit, the references listed below should be cited. If you publish data/results generated by altered source versions, the altered source code must be published as well.
I work at a company in CFD/consulting/R&D or related fields. Can I use FluidX3D commercially?

No. Commercial use is not allowed with the current license.
Is FluidX3D open-source?

No. "Open-source" as a technical term is defined as freely available without any restriction on use, but I am not comfortable with that. I have written FluidX3D in my spare time and no one should milk it for profits while I remain uncompensated, especially considering what other CFD software sells for. The technical term for the type of license I choose is "source-available no-cost non-commercial". The source code is freely available, and you are free to use, to alter and to redistribute it, as long as you do not sell it or make a profit from derived products/services, and as long as you do not use it for any military purposes (see the license for details).
Will FluidX3D at some point be available with a commercial license?

Maybe I will add the option for a second, commercial license later on. If you are interested in commercial use, let me know. For non-commercial use in science and education, FluidX3D is and will always be free.

External Code/Libraries/Images used in FluidX3D

OpenCL-Headers and C++ Wrapper for GPU parallelization (Khronos Group)
Win32 API for interactive graphics in Windows (Microsoft)
X11/Xlib for interactive graphics in Linux (The Open Group)
marching-cubes tables for isosurface generation on GPU (Paul Bourke)
src/lodepng.cpp and src/lodepng.hpp for .png encoding and decoding (Lode Vandevenne)
SimplexNoise class in src/utilities.hpp for generating continuous noise in 2D/3D/4D space (Stefan Gustavson)
skybox/skybox8k.png for free surface raytracing (HDRI Hub)

References

Lehmann, M.: Computational study of microplastic transport at the water-air interface with a memory-optimized lattice Boltzmann method. PhD thesis, (2023)
Lehmann, M.: Esoteric Pull and Esoteric Push: Two Simple In-Place Streaming Schemes for the Lattice Boltzmann Method on GPUs. Computation, 10, 92, (2022)
Lehmann, M., Krause, M., Amati, G., Sega, M., Harting, J. and Gekle, S.: Accuracy and performance of the lattice Boltzmann method with 64-bit, 32-bit, and customized 16-bit number formats. Phys. Rev. E 106, 015308, (2022)
Lehmann, M.: Combined scientific CFD simulation and interactive raytracing with OpenCL. IWOCL'22: International Workshop on OpenCL, 3, 1-2, (2022)
Lehmann, M., Oehlschlägel, L.M., Häusl, F., Held, A. and Gekle, S.: Ejection of marine microplastics by raindrops: a computational and experimental study. Micropl.&Nanopl. 1, 18, (2021)
Lehmann, M.: High Performance Free Surface LBM on GPUs. Master's thesis, (2019)
Lehmann, M. and Gekle, S.: Analytic Solution to the Piecewise Linear Interface Construction Problem and Its Application in Curvature Calculation for Volume-of-Fluid Simulation Codes. Computation, 10, 21, (2022)

Contact

FluidX3D is solo-developed and maintained by Dr. Moritz Lehmann.
For any questions, feedback or other inquiries, contact me at dr.moritz.lehmann@gmail.com.
Updates are posted on Mastodon via @ProjectPhysX/#FluidX3D and on YouTube.

Support

I'm developing FluidX3D in my spare time, to make computational fluid dynamics lightning fast, accessible on all hardware, and free for everyone. - You can support FluidX3D by reporting any bugs or things that don't work in the issues. I'm welcoming feedback! - If you like FluidX3D, share it with friends and colleagues. Spread the word that CFD is now lightning fast, accessible and free. - If you want to support FluidX3D financially, you can sponsor me on GitHub or buy me a coffee. Thank you!

Owner

Name: Dr. Moritz Lehmann
Login: ProjectPhysX
Kind: user
Location: Bayreuth, Germany
Company: University of Bayreuth

Twitter: ProjectPhysX
Repositories: 3
Profile: https://github.com/ProjectPhysX

Summa cum laude Physics PhD at age 25 | Graduate @ Elite Net Bavaria | Khronos OpenCL Advisor | FluidX3D GPU developer | DLR_Graduate_Program

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Lehmann"
  given-names: "Moritz"
  orcid: "https://orcid.org/0000-0002-4652-8383"
title: "FluidX3D"
date-released: 2022-08-04
url: "https://github.com/ProjectPhysX/FluidX3D"

GitHub Events

Total

Create event: 4
Release event: 4
Issues event: 78
Watch event: 704
Issue comment event: 169
Push event: 55
Pull request review event: 1
Pull request review comment event: 1
Pull request event: 7
Fork event: 94

Last Year

Create event: 4
Release event: 4
Issues event: 78
Watch event: 704
Issue comment event: 169
Push event: 55
Pull request review event: 1
Pull request review comment event: 1
Pull request event: 7
Fork event: 94

Committers

Last synced: 9 months ago

All Time

Total Commits: 314
Total Committers: 1
Avg Commits per committer: 314.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 102
Committers: 1
Avg Commits per committer: 102.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Dr. Moritz Lehmann	d**n@g**m	314

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 219
Total pull requests: 16
Average time to close issues: 20 days
Average time to close pull requests: 5 days
Total issue authors: 126
Total pull request authors: 10
Average comments per issue: 3.47
Average comments per pull request: 0.81
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 60
Pull requests: 8
Average time to close issues: 11 days
Average time to close pull requests: about 2 hours
Issue authors: 36
Pull request authors: 4
Average comments per issue: 1.12
Average comments per pull request: 0.63
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Meerkov (14)
wjsjtu123 (13)
rodionstepanov (12)
Afei2021 (6)
trparry (6)
PavelBlend (5)
JunchenLyu (4)
lk251 (4)
SirWixy (4)
stefano2734 (3)
randomwangran (3)
ProjectPhysX (3)
IsTrivial (3)
Yujif1Aero (3)
jasonxauat (3)

Pull Request Authors

HarrierPigeon (4)
morkev (3)
phansel (2)
lxianguang (2)
gurkanctn (2)
bcardarella (2)
thenightterorx (2)
saitejamalyala (1)
FCLC (1)
Yaraslaut (1)

Top Labels

Issue Labels

question (22) setup question (21) feature request (18) good first issue (15) bug (12) not an issue (11) device driver issue (5) compiler/driver/OS issue (5) duplicate (3) info (1) help wanted (1)

fluidx3d

Science Score: 67.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

FluidX3D

How to get started?

Compute Features - Getting the Memory Problem under Control

Solving the Visualization Problem

Solving the Compatibility Problem

Single-GPU/CPU Benchmarks

Multi-GPU Benchmarks

FAQs

General

Hardware

Graphics

Licensing

External Code/Libraries/Images used in FluidX3D

References

Contact

Support

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels