Projects

Updated 10 months ago

mlflow • Rank 35.0 • Science 36%

The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.

agentops agents ai ai-governance apache-spark evaluation langchain llm-evaluation llmops machine-learning ml mlflow mlops model-management observability open-source openai prompt-engineering

Engineering Earth and Environmental Sciences (40%)

Updated 10 months ago

nora • Rank 0.7 • Science 57%

Spark-based OWL reasoner

apache-spark nosql owl-reasoner owlapi

Updated 10 months ago

flintrock • Rank 16.0 • Science 33%

A command-line tool for launching Apache Spark clusters.

apache-spark apache-spark-cluster ec2 orchestration spark-ec2

Updated 10 months ago

https://github.com/cbg-ethz/pybda • Rank 7.5 • Science 36%

:computer::computer::computer: A commandline tool for analysis of big biological data sets for distributed HPC clusters.

apache-spark big-data machine-learning python snakemake

Updated 10 months ago

https://github.com/cumbof/chopin2 • Rank 6.1 • Science 36%

Domain-Agnostic Supervised Learning with Hyperdimensional Computing

apache-spark backward-elimination feature-selection gpgpu hd-computing machine-learning supervised-learning vsa

Updated 10 months ago

spruce • Rank 2.5 • Science 26%

Enrichment pipeline for CUR reports which adds energy and carbon data allowing to report and reduce the impact of the your cloud usage.

apache-spark aws carbon-emissions climate cloud greenops greensoftware sustainability

Updated 10 months ago

https://github.com/aryashah2k/apache-pyspark-primer • Rank 2.2 • Science 10%

Repository Documenting All The Learning For Apache PySpark. Accompanying Code Files and Interactive Theory With Lots Of Examples

apache-spark pyspark python

Updated 10 months ago

https://github.com/awslabs/amazon-emr-vscode-toolkit • Science 13%

A VS Code Extension to make it easier to manage and develop Spark jobs on EMR

amazon-emr apache-spark pyspark python

Updated 10 months ago

dsgrid • Science 26%

Python package for working with demand-side grid projects, datasets and queries

apache-spark electricity-load energy-data energy-demand energy-demand-forecasting python

Updated 8 months ago

https://github.com/lamastex/scadamale • Science 26%

Scalable Data Science and Distributed Machine Learning Course Book written by Raazesh Sainudiin and his WASP AI-Track PhD Students

apache-spark data-science deep-learning distributed machine-learning python scala scalable

Updated 10 months ago

spark-dynamic-executor-time-prediction • Science 57%

Neural Network Models for Predicting Execution Time with Dynamic Executor Allocation in Apache Spark.

apache-spark big-data-analytics deep-learning distributed-computing dynamic-allocation execution-time-prediction machine-learning neural-networks performance-modeling spark

Updated 10 months ago

https://github.com/dadananjesha/redshift-etl-project • Science 13%

The project covers the complete data pipeline—from importing data from an RDS source to HDFS using Sqoop, processing data with Spark, to executing analytical queries on an AWS Redshift cluster.

apache-spark aws data-engineering-etl-assignment data-ingestion data-pipeline etl-processes hdfs rds redshift spark sqoop

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

mlflow • Rank 35.0 • Science 36%

nora • Rank 0.7 • Science 57%

flintrock • Rank 16.0 • Science 33%

https://github.com/cbg-ethz/pybda • Rank 7.5 • Science 36%

https://github.com/cumbof/chopin2 • Rank 6.1 • Science 36%

spruce • Rank 2.5 • Science 26%

https://github.com/aryashah2k/apache-pyspark-primer • Rank 2.2 • Science 10%

https://github.com/awslabs/amazon-emr-vscode-toolkit • Science 13%

dsgrid • Science 26%

https://github.com/lamastex/scadamale • Science 26%

spark-dynamic-executor-time-prediction • Science 57%

https://github.com/dadananjesha/redshift-etl-project • Science 13%