BoboCEP
BoboCEP: a Fault-Tolerant Complex Event Processing Engine for Edge Computing in Internet of Things - Published in JOSS (2023)
hivemind
Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
airflow-provider-vineyard
vineyard (v6d): an in-memory immutable data manager. (Project under CNCF, TAG-Storage)
faststream
FastStream is a powerful and easy-to-use Python framework for building asynchronous services interacting with event streams such as Apache Kafka, RabbitMQ, NATS and Redis.
nerlnet
Nerlnet is a framework for research and development of distributed machine learning models on IoT
hyx
🧘♀️ Lightweight fault tolerant primitives for your modern asyncio Python microservices
https://github.com/pegasus-isi/pegasus
Pegasus Workflow Management System - Automate, recover, and debug scientific computations.
https://github.com/pachyderm/pachyderm
Data-Centric Pipelines and Data Versioning
fugue
A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
https://github.com/synnaxlabs/synnax
The data and operations foundation for hardware.
https://github.com/bigscience-workshop/petals
🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
https://github.com/larribas/dagger
Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).
https://github.com/copyleftdev/kukai
KūKai is a modular, high-performance load-testing framework for TCP-based protocols. Inspired by the Hawaiian god Kūkailimoku (often called Kū), associated with warfare and strategic battles, KūKai aims to help you “wage war” on servers to test their capacity and resilience.
delicoco-ieee-transactions
In compressed decentralized optimization settings, there are benefits to having multiple gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e.g. by means of reducing the precision of compressed information.
https://github.com/aveek-saha/two-phase-commit
A consistent distributed KV store that implements the two phase commit protocol, written in java, using gRPC
dsrt-2024-distributed-monitoring
Experiments for "An Architecture and Prototype for Monitoring Distributed Simulations of Distributed Systems"
https://github.com/dineshpinto/synchronous-gossip-protocol
Python implementation of a synchronous gossip protocol with Byzantine nodes
https://github.com/cedrickchee/testing-distributed-systems
Curated list of resources on testing distributed systems
https://github.com/amilworks/amilworks.github.io
Personal Website that showcases my publications and the software I develop.
https://github.com/adamouization/relaxation-technique-parallel-computing
:repeat: Relaxation technique using POSIX threads (shared memory configuration) and MPI (distributed memory configuration).
https://github.com/adalkiran/distributed-inference
A project to demonstrate an approach to designing cross-language and distributed pipeline in deep learning/machine learning domain, using WebRTC and Redis Streams.
xinda
Automated Testing and Adaptive Detection of **Slow Faults** in Distributed Systems
distributed-systems-energy-efficiency
Distributed social media backend showcasing energy efficiency techniques