libCEED
libCEED: Fast algebra for high-order element-based discretizations - Published in JOSS (2021)
mpi4jax
mpi4jax: Zero-copy MPI communication of JAX arrays - Published in JOSS (2021)
torchquad
torchquad: Numerical Integration in Arbitrary Dimensions with PyTorch - Published in JOSS (2021)
GeophysicalFlows.jl
GeophysicalFlows.jl: Solvers for geophysical fluid dynamics problems in periodic domains on CPUs & GPUs - Published in JOSS (2021)
Oceananigans.jl
Oceananigans.jl: Fast and friendly geophysical fluid dynamics on GPUs - Published in JOSS (2020)
Makie.jl
Makie.jl: Flexible high-performance data visualization for Julia - Published in JOSS (2021)
umpire
An application-focused API for memory management on NUMA & GPU architectures
heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
gpullama3.java
GPU-accelerated Llama3.java inference in pure Java using TornadoVM.
t-elf
Tensor Extraction of Latent Features (T-ELF). Within T-ELF's arsenal are non-negative matrix and tensor factorization solutions, equipped with automatic model determination (also known as the estimation of latent factors - rank) for accurate data modeling. Our software suite encompasses cutting-edge data pre-processing and post-processing modules.
scalene
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
tensorcircuit-ng
The next-gen tensor network based quantum software framework: superseding the original TensorCircuit
pennylane-lightning
The Lightning plugin ecosystem provides fast quantum state-vector and tensor network simulators written in C++ for use with PennyLane.
flamegpu2
FLAME GPU 2 is a GPU accelerated agent based modelling framework for CUDA C++ and Python
habitat
🔮 Execution time predictions for deep neural network training iterations across different GPUs.
fluidx3d
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
devito
DSL and compiler framework for automated finite-differences and stencil computation
RadonKA
A simple yet sufficiently fast (attenuated) Radon and backproject implementation using KernelAbstractions.jl. Runs on CPU, CUDA, ...
BoundaryValueDiffEq
Boundary value problem (BVP) solvers for scientific machine learning (SciML)
pyhpc-benchmarks
A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python :rocket:
pytorch-benchmark
Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumption
lc0
Open source neural network chess engine with GPU acceleration and broad hardware support.
DeconvOptim
A multi-dimensional, high performance deconvolution framework written in Julia Lang for CPUs and GPUs.
pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
exponentialutilities.jl
Fast and differentiable implementations of matrix exponentials, Krylov exponential matrix-vector multiplications ("expmv"), KIOPS, ExpoKit functions, and more. All your exponential needs in SciML form.
triton-model-navigator
Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.
librapid
A highly optimised C++ library for mathematical applications and neural networks.
opencl-benchmark
A small OpenCL benchmark program to measure peak GPU/CPU performance.
text2vec-service
Service for Bert model to Vector. 高效的文本转向量(Text-To-Vector)服务,支持GPU多卡、多worker、多客户端调用,开箱即用。
wakis
3D electromagnetic time-domain solver, specialized in wake potential and beam-coupling impedance computation for particle accelerators
phoebe
A high-performance framework for solving phonon and electron Boltzmann equations
powerfit-em
Rigid body fitting of atomic strucures in cryo-electron microscopy density maps
qrack
Comprehensive, GPU accelerated framework for developing universal virtual quantum processors
hybridbackend
A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster
cudawrappers
C++ wrapper for the Nvidia/HIP C libraries (e.g. CUDA driver, nvrtc, hiprtc, cuFFT, hipFFT, etc.)
MHDFlows
Three Dimensional Magnetohydrodynamic(MHD) pseudospectral solvers written in julia with FourierFlows.jl
bundoora
Customized development container environment for consistent and efficient execution of machine learning projects.
datoviz
⚡ Datoviz: high-performance GPU rendering for scientific data visualization
mcmlgpu
This repository contains the base code for Monte Carlo simulations in a GPU of light transport on turbid media in GPU.
vector-sum-cuda
Comparing performance of sequential vs CUDA-based vector element sum.
superterrainplus
SuperTerrain+: A real-time procedural 3D infinite terrain engine with geographical features and photorealistic rendering.
transformers-bart-pretrain
Script to pre-train hugginface transformers BART with Tensorflow 2
kernel_launcher
Using C++ magic to launch/capture CUDA kernels and tune them with Kernel Tuner
ml_with_aws_sagemaker
Learn how to scale up ML/AI pipelines using AWS SageMaker (GPUs, Cloud computing)
vkcompviz
Vulkan image and data processing framework capable of running a cascade of compute shaders and displaying or storing the result.
server
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
astro-accelerate
AstroAccelerate is a many-core accelerated software package for processing time-domain radio-astronomy data.