Updated 9 months ago

https://github.com/apache/hamilton • Rank 12.1 • Science 36%

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

Updated 9 months ago

globalbioticinteractions • Science 59%

Global Biotic Interactions provides access to existing species interaction datasets

Updated 9 months ago

com.cefriel • Science 57%

Composable Semantic Transformation Pipelines

Scientific Software
Updated 9 months ago

hotsub — Peer-reviewed • Science 93%

hotsub: A batch job engine for cloud services with ETL framework - Published in JOSS (2018)

Updated 9 months ago

https://github.com/dadananjesha/azuredataengine • Science 13%

AzureDataEngine is a robust, scalable batch processing data architecture built on the Azure platform. It efficiently extracts, transforms, and loads massive datasets for machine learning applications, leveraging Azure Blob Storage, PostgreSQL, Databricks, and Key Vault to ensure reliability and maintainability.