Updated 7 months ago

mlflow • Rank 35.0 • Science 36%

The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.

Updated 7 months ago

nora • Rank 0.7 • Science 57%

Spark-based OWL reasoner

Updated 7 months ago

flintrock • Rank 16.0 • Science 33%

A command-line tool for launching Apache Spark clusters.

Updated 7 months ago

https://github.com/cbg-ethz/pybda • Rank 7.5 • Science 36%

:computer::computer::computer: A commandline tool for analysis of big biological data sets for distributed HPC clusters.

Updated 7 months ago

spruce • Rank 2.5 • Science 26%

Enrichment pipeline for CUR reports which adds energy and carbon data allowing to report and reduce the impact of the your cloud usage.

Updated 7 months ago

https://github.com/aryashah2k/apache-pyspark-primer • Rank 2.2 • Science 10%

Repository Documenting All The Learning For Apache PySpark. Accompanying Code Files and Interactive Theory With Lots Of Examples

Updated 7 months ago

dsgrid • Science 26%

Python package for working with demand-side grid projects, datasets and queries

Updated 6 months ago

https://github.com/lamastex/scadamale • Science 26%

Scalable Data Science and Distributed Machine Learning Course Book written by Raazesh Sainudiin and his WASP AI-Track PhD Students

Updated 7 months ago

https://github.com/awslabs/amazon-emr-vscode-toolkit • Science 13%

A VS Code Extension to make it easier to manage and develop Spark jobs on EMR

Updated 7 months ago

https://github.com/dadananjesha/redshift-etl-project • Science 13%

The project covers the complete data pipeline—from importing data from an RDS source to HDFS using Sqoop, processing data with Spark, to executing analytical queries on an AWS Redshift cluster.