https://github.com/agnostiqhq/covalent
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
statxplore
This is a data harvester for the Department for Work and Pensions Stat-Explore
https://github.com/modelscope/data-juicer
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
https://github.com/airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
augraphy
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
https://github.com/elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
https://github.com/agnostiqhq/covalent-ssh-plugin
Executor plugin interfacing Covalent with remote backends using SSH
https://github.com/agnostiqhq/covalent-slurm-plugin
Executor plugin interfacing Covalent with Slurm
https://github.com/buchananja/dpyp
A convenience tool for small-scale data pipelines in Python
https://github.com/whylabs/whylogs
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
https://github.com/dadananjesha/redshift-etl-project
The project covers the complete data pipeline—from importing data from an RDS source to HDFS using Sqoop, processing data with Spark, to executing analytical queries on an AWS Redshift cluster.