fluidx3d
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
scimlbook
Parallel Computing and Scientific Machine Learning (SciML): Methods and Applications (MIT 18.337J/6.338J)
montecarlomeasurements.jl
Propagation of distributions by Monte-Carlo sampling: Real number types with uncertainty represented by samples.
opencl-benchmark
A small OpenCL benchmark program to measure peak GPU/CPU performance.
pararealgpu.jl
A distributed and GPU-based implementation of the Parareal algorithm for parallel-in-time integration of equations of motion.
reproducible-research-with-gpu-jupyter
This repository demonstrates how to use GPU-Jupyter for reproducible deep learning research with minimal setup effort..
predictive-analytics
A self-driven exploratory report on predictive data analytics employing a range of machine learning techniques
parsec
PaRSEC is a generic framework for architecture aware scheduling and management of micro-tasks on distributed, GPU accelerated, many-core heterogeneous architectures. PaRSEC assigns computation threads to the cores, GPU accelerators, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on architectural features such as NUMA nodes and algorithmic features such as data reuse.
dplasma
DPLASMA is a highly optimized, accelerator-aware, implementation of a dense linear algebra package for distributed heterogeneous systems. It is designed to deliver sustained performance for distributed systems where each node featuring multiple sockets of multicore processors, and if available, accelerators, using the PaRSEC runtime as a backend.
cudawrappers
C++ wrapper for the Nvidia/HIP C libraries (e.g. CUDA driver, nvrtc, hiprtc, cuFFT, hipFFT, etc.)
Ginkgo
Ginkgo: A high performance numerical linear algebra library - Published in JOSS (2020)