Updated 29 days ago

Cost-Effective Big Data Orchestration Using Dagster: A Multi-Platform Approach • Rank 5.6 • Science 92%

Cost-Effective Big Data Orchestration Using Dagster: A Multi-Platform Approach - Published in JOSS (2026)

Updated 7 months ago

https://github.com/dadananjesha/azuredataengine • Science 13%

AzureDataEngine is a robust, scalable batch processing data architecture built on the Azure platform. It efficiently extracts, transforms, and loads massive datasets for machine learning applications, leveraging Azure Blob Storage, PostgreSQL, Databricks, and Key Vault to ensure reliability and maintainability.

Updated 7 months ago

https://github.com/data-miner00/spark • Science 26%

A laboratory to carry out experiments with PySpark

Updated 7 months ago

pysparklyr • Science 26%

Extension to {sparklyr} that allows you to interact with Spark & Databricks Connect