Updated 6 months ago

mlflow • Rank 35.0 • Science 36%

The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.

Updated 6 months ago

nora • Rank 0.7 • Science 57%

Spark-based OWL reasoner

Updated 6 months ago

flintrock • Rank 16.0 • Science 33%

A command-line tool for launching Apache Spark clusters.

Updated 5 months ago

https://github.com/cbg-ethz/pybda • Rank 7.5 • Science 36%

:computer::computer::computer: A commandline tool for analysis of big biological data sets for distributed HPC clusters.

Updated 5 months ago

spruce • Rank 2.5 • Science 26%

Enrichment pipeline for CUR reports which adds energy and carbon data allowing to report and reduce the impact of the your cloud usage.

Updated 5 months ago

https://github.com/aryashah2k/apache-pyspark-primer • Rank 2.2 • Science 10%

Repository Documenting All The Learning For Apache PySpark. Accompanying Code Files and Interactive Theory With Lots Of Examples

Updated 5 months ago

https://github.com/awslabs/amazon-emr-vscode-toolkit • Science 13%

A VS Code Extension to make it easier to manage and develop Spark jobs on EMR

Updated 5 months ago

https://github.com/dadananjesha/redshift-etl-project • Science 13%

The project covers the complete data pipeline—from importing data from an RDS source to HDFS using Sqoop, processing data with Spark, to executing analytical queries on an AWS Redshift cluster.

Updated 5 months ago

dsgrid • Science 26%

Python package for working with demand-side grid projects, datasets and queries

Updated 4 months ago

https://github.com/lamastex/scadamale • Science 26%

Scalable Data Science and Distributed Machine Learning Course Book written by Raazesh Sainudiin and his WASP AI-Track PhD Students