Updated 9 months ago
https://github.com/apache/hamilton
Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Updated 9 months ago
globalbioticinteractions
Global Biotic Interactions provides access to existing species interaction datasets
Scientific Software
Updated 9 months ago
hotsub
hotsub: A batch job engine for cloud services with ETL framework - Published in JOSS (2018)
Scientific Software · Peer-reviewed
Updated 9 months ago
https://github.com/dadananjesha/azuredataengine
AzureDataEngine is a robust, scalable batch processing data architecture built on the Azure platform. It efficiently extracts, transforms, and loads massive datasets for machine learning applications, leveraging Azure Blob Storage, PostgreSQL, Databricks, and Key Vault to ensure reliability and maintainability.