libCEED
libCEED: Fast algebra for high-order element-based discretizations - Published in JOSS (2021)
mpi4jax
mpi4jax: Zero-copy MPI communication of JAX arrays - Published in JOSS (2021)
torchquad
torchquad: Numerical Integration in Arbitrary Dimensions with PyTorch - Published in JOSS (2021)
New developments in PySDM and PySDM-examples v2
New developments in PySDM and PySDM-examples v2: collisional breakup, immersion freezing, dry aerosol initialization, and adaptive time-stepping - Published in JOSS (2023)
GeophysicalFlows.jl
GeophysicalFlows.jl: Solvers for geophysical fluid dynamics problems in periodic domains on CPUs & GPUs - Published in JOSS (2021)
x11docker
x11docker: Run GUI applications in Docker containers - Published in JOSS (2019)
FastGeodis
FastGeodis: Fast Generalised Geodesic Distance Transform - Published in JOSS (2022)
Pyrgg
Pyrgg: Python Random Graph Generator - Published in JOSS (2017)
TensorFlow.jl
TensorFlow.jl: An Idiomatic Julia Front End for TensorFlow - Published in JOSS (2018)
GraphNeT
GraphNeT: Graph neural networks for neutrino telescope event reconstruction - Published in JOSS (2023)
Oceananigans.jl
Oceananigans.jl: Fast and friendly geophysical fluid dynamics on GPUs - Published in JOSS (2020)
SiSyPHE
SiSyPHE: A Python package for the Simulation of Systems of interacting mean-field Particles with High Efficiency - Published in JOSS (2021)
GPUE
GPUE: Graphics Processing Unit Gross--Pitaevskii Equation solver - Published in JOSS (2018)
Makie.jl
Makie.jl: Flexible high-performance data visualization for Julia - Published in JOSS (2021)
umpire
An application-focused API for memory management on NUMA & GPU architectures
heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
gpullama3.java
GPU-accelerated Llama3.java inference in pure Java using TornadoVM.
t-elf
Tensor Extraction of Latent Features (T-ELF). Within T-ELF's arsenal are non-negative matrix and tensor factorization solutions, equipped with automatic model determination (also known as the estimation of latent factors - rank) for accurate data modeling. Our software suite encompasses cutting-edge data pre-processing and post-processing modules.
scalene
Scalene: a high-performance, high-precision CPU, GPU, and memory profiler for Python with AI-powered optimization proposals
tensorcircuit-ng
The next-gen tensor network based quantum software framework: superseding the original TensorCircuit
pennylane-lightning
The Lightning plugin ecosystem provides fast quantum state-vector and tensor network simulators written in C++ for use with PennyLane.
flamegpu2
FLAME GPU 2 is a GPU accelerated agent based modelling framework for CUDA C++ and Python
habitat
🔮 Execution time predictions for deep neural network training iterations across different GPUs.
fluidx3d
The fastest and most memory efficient lattice Boltzmann CFD software, running on all GPUs and CPUs via OpenCL. Free for non-commercial use.
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
devito
DSL and compiler framework for automated finite-differences and stencil computation
RadonKA
A simple yet sufficiently fast (attenuated) Radon and backproject implementation using KernelAbstractions.jl. Runs on CPU, CUDA, ...
BoundaryValueDiffEq
Boundary value problem (BVP) solvers for scientific machine learning (SciML)
https://github.com/sktime/pytorch-forecasting
Time series forecasting with PyTorch
pyhpc-benchmarks
A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python :rocket:
pytorch-benchmark
Easily benchmark PyTorch model FLOPs, latency, throughput, allocated gpu memory and energy consumption
celeritas
Celeritas is a new Monte Carlo transport code designed to accelerate scientific discovery in high energy physics by improving detector simulation throughput and energy efficiency using GPUs.
https://github.com/cans-world/cans
A code for fast, massively-parallel direct numerical simulations (DNS) of canonical flows
lc0
Open source neural network chess engine with GPU acceleration and broad hardware support.
DeconvOptim
A multi-dimensional, high performance deconvolution framework written in Julia Lang for CPUs and GPUs.
impactx
high-performance modeling of beam dynamics in particle accelerators with collective effects
https://github.com/uxlfoundation/scikit-learn-intelex
Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
h2o
H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
exponentialutilities.jl
Fast and differentiable implementations of matrix exponentials, Krylov exponential matrix-vector multiplications ("expmv"), KIOPS, ExpoKit functions, and more. All your exponential needs in SciML form.
tmu
Implements the Tsetlin Machine, Coalesced Tsetlin Machine, Convolutional Tsetlin Machine, Regression Tsetlin Machine, and Weighted Tsetlin Machine, with support for continuous features, drop clause, Type III Feedback, focused negative sampling, multi-task classifier, autoencoder, literal budget, and one-vs-one multi-class classifier. TMU is written in Python with wrappers for C and CUDA-based clause evaluation and updating.
triton-model-navigator
Triton Model Navigator is an inference toolkit designed for optimizing and deploying Deep Learning models with a focus on NVIDIA GPUs.
librapid
A highly optimised C++ library for mathematical applications and neural networks.
scimlbenchmarksoutput
SciML-Bench Benchmarks for Scientific Machine Learning (SciML), Physics-Informed Machine Learning (PIML), and Scientific AI Performance
quokka
Two-moment AMR radiation hydrodynamics (with self-gravity, particles, and chemistry) on CPUs/GPUs for astrophysics
opencl-benchmark
A small OpenCL benchmark program to measure peak GPU/CPU performance.
jetson-stats
📊 Simple package for monitoring and control your NVIDIA Jetson [Orin, Xavier, Nano, TX] series
text2vec-service
Service for Bert model to Vector. 高效的文本转向量(Text-To-Vector)服务,支持GPU多卡、多worker、多客户端调用,开箱即用。
https://github.com/bytedance/flux
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
https://github.com/openmm/nnpops
High-performance operations for neural network potentials
BG_Flood
Numerical model for simulating shallow water hydrodynamics on the GPU using an Adaptive Mesh Refinment type grid. The model was designed with the goal of simulating inundation (River, Storm surge or tsunami). The model uses a Block Uniform Quadtree approach that runs on the GPU but the adaptive/multi-resolution/AMR is being implemented and not yet operational. The core SWE engine and adaptivity has been inspired and taken from St Venant solver from Basilisk and the CUDA GPU memory model has been inspired by the work from Vacondio _et al._2017)
HIBAG
R package – HLA Genotype Imputation with Attribute Bagging (development version only)