libCEED
libCEED: Fast algebra for high-order element-based discretizations - Published in JOSS (2021)
New developments in PySDM and PySDM-examples v2
New developments in PySDM and PySDM-examples v2: collisional breakup, immersion freezing, dry aerosol initialization, and adaptive time-stepping - Published in JOSS (2023)
Triumvirate
Triumvirate: A Python/C++ package for three-point clustering measurements - Published in JOSS (2023)
GPUE
GPUE: Graphics Processing Unit Gross--Pitaevskii Equation solver - Published in JOSS (2018)
Disimpy
Disimpy: A massively parallel Monte Carlo simulator for generating diffusion-weighted MRI data in Python - Published in JOSS (2020)
Open Source Optical Coherence Tomography Software
Open Source Optical Coherence Tomography Software - Published in JOSS (2020)
deepmd-kit
A deep learning package for many-body potential energy representation and molecular dynamics
pykeen
🤖 A Python library for learning and evaluating knowledge graph embeddings
sboxgates
sboxgates: A program for finding low gate count implementations of S-boxes - Published in JOSS (2021)
ACHR.cu
ACHR.cu: GPU-accelerated sampling of metabolic networks - Published in JOSS (2019)
pennylane-lightning
The Lightning plugin ecosystem provides fast quantum state-vector and tensor network simulators written in C++ for use with PennyLane.
flamegpu2
FLAME GPU 2 is a GPU accelerated agent based modelling framework for CUDA C++ and Python
octotiger
Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees
iree-base-compiler
A retargetable MLIR-based machine learning compiler and runtime toolkit.
cresset
Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.
RadonKA
A simple yet sufficiently fast (attenuated) Radon and backproject implementation using KernelAbstractions.jl. Runs on CPU, CUDA, ...
abmgpu
Agent Based Model on GPU using CUDA 12.2.1 and OpenGL 4.5 (CUDA OpenGL interop) on Windows/Linux
burn
Burn is a next generation Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
celeritas
Celeritas is a new Monte Carlo transport code designed to accelerate scientific discovery in high energy physics by improving detector simulation throughput and energy efficiency using GPUs.
lc0
Open source neural network chess engine with GPU acceleration and broad hardware support.
necsim-rust
Spatially explicit biodiversity simulations using a parallel library written in Rust
tmu
Implements the Tsetlin Machine, Coalesced Tsetlin Machine, Convolutional Tsetlin Machine, Regression Tsetlin Machine, and Weighted Tsetlin Machine, with support for continuous features, drop clause, Type III Feedback, focused negative sampling, multi-task classifier, autoencoder, literal budget, and one-vs-one multi-class classifier. TMU is written in Python with wrappers for C and CUDA-based clause evaluation and updating.
librapid
A highly optimised C++ library for mathematical applications and neural networks.
torchpq
Approximate nearest neighbor search with product quantization on GPU in pytorch and cuda
icicle-core
A hardware acceleration library for compute intensive cryptography :ice_cube:
quokka
Two-moment AMR radiation hydrodynamics (with self-gravity, particles, and chemistry) on CPUs/GPUs for astrophysics
pararealgpu.jl
A distributed and GPU-based implementation of the Parareal algorithm for parallel-in-time integration of equations of motion.
https://github.com/bytedance/flux
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
https://github.com/openmm/nnpops
High-performance operations for neural network potentials
https://github.com/cair/pytsetlinmachinecuda
Massively Parallel and Asynchronous Architecture for Logic-based AI
SPbLA
SPbLA: The Library of GPGPU-powered Sparse Boolean Linear Algebra Operations - Published in JOSS (2022)
https://github.com/bytedance/lightseq
LightSeq: A High Performance Library for Sequence Processing and Generation
pytorch-cuda-2.7.1
Clone of PyTorch: Tensors and Dynamic neural networks in Python and C++ with strong GPU acceleration.
https://github.com/SepKfr/Coarse-and-Fine-Grained-Forecasting-Via-GP-Blurring-Effect
Forecast-blur-denoise forecasting model with PyTorch
https://github.com/bytedance/abq-llm
An acceleration library that supports arbitrary bit-width combinatorial quantization operations
https://github.com/dbraun/pytorchtop
GPU PyTorch TOP in TouchDesigner with CUDA-enabled OpenCV
https://github.com/conradsnicta/bandicoot-code
Bandicoot: C++ library for GPU linear algebra & scientific computing - https://coot.sourceforge.io
https://github.com/bencardoen/singularity_slurm_cuda
Example on how to get started with Singularity and CUDA on a SLURM cluster
kernel_launcher
Using C++ magic to launch/capture CUDA kernels and tune them with Kernel Tuner
cuda-accelerated-visual-inertial-odometry-fusion
Harness the power of GPU acceleration for fusing visual odometry and IMU data with an advanced Unscented Kalman Filter (UKF) implementation. Developed in C++ and utilizing CUDA, cuBLAS, and cuSOLVER, this system offers unparalleled real-time performance in state and covariance estimation for robotics and autonomous system applications.
kmm
KMM: parallel dataflow scheduler and efficient memory management for multi-GPU platforms
https://github.com/beehive-lab/tornadovm
TornadoVM: A practical and efficient heterogeneous programming framework for managed languages
qc-cugbasis
High performance CUDA/Python library for computing quantum chemistry density-based descriptors for larger systems using GPUs.
kernel_float
CUDA/HIP header-only library for low-precision (16 bit, 8 bit) and vectorized GPU kernel development
simulateqcd
SIMULATeQCD is a multi-GPU Lattice QCD framework that makes it easy for physicists to implement lattice QCD formulas while still providing competitive performance.
https://github.com/cair/fast-tsetlin-machine-in-cuda-with-imdb-demo
A CUDA implementation of the Tsetlin Machine based on bitwise operators
thereminq-classiq
ThereminQ CLassiQ - QuantOPS : Orchestrate Qrack, Bonsai, Qimcifa and Tipsy in OpenCL, VCL and CUDA with an X WebUI
https://github.com/dansarie/socracked
Performs key-recovery attacks on the SoDark family of algorithms.
jaxngp
JAX implementation of instant-ngp (NeRF part)