mlflow
The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.
https://github.com/cbg-ethz/pybda
:computer::computer::computer: A commandline tool for analysis of big biological data sets for distributed HPC clusters.
https://github.com/cumbof/chopin2
Domain-Agnostic Supervised Learning with Hyperdimensional Computing
spruce
Enrichment pipeline for CUR reports which adds energy and carbon data allowing to report and reduce the impact of the your cloud usage.
https://github.com/aryashah2k/apache-pyspark-primer
Repository Documenting All The Learning For Apache PySpark. Accompanying Code Files and Interactive Theory With Lots Of Examples
https://github.com/awslabs/amazon-emr-vscode-toolkit
A VS Code Extension to make it easier to manage and develop Spark jobs on EMR
https://github.com/dadananjesha/redshift-etl-project
The project covers the complete data pipeline—from importing data from an RDS source to HDFS using Sqoop, processing data with Spark, to executing analytical queries on an AWS Redshift cluster.
spark-dynamic-executor-time-prediction
Neural Network Models for Predicting Execution Time with Dynamic Executor Allocation in Apache Spark.
dsgrid
Python package for working with demand-side grid projects, datasets and queries
https://github.com/lamastex/scadamale
Scalable Data Science and Distributed Machine Learning Course Book written by Raazesh Sainudiin and his WASP AI-Track PhD Students