Projects | Open Source Science

Updated 6 months ago

great_expectations • Rank 32.6 • Science 77%

Always know what to expect from your data.

cleandata data-engineering data-profilers data-profiling data-quality data-science data-unit-tests datacleaner datacleaning dataquality dataunittest eda exploratory-analysis exploratory-data-analysis exploratorydataanalysis mlops pipeline pipeline-debt pipeline-testing pipeline-tests

Updated 6 months ago

aim • Rank 30.2 • Science 67%

Aim 💫 — An easy-to-use & supercharged open-source experiment tracker.

ai data-science data-visualization experiment-tracking machine-learning metadata metadata-tracking ml mlflow mlops prompt-engineering python pytorch tensorboard tensorflow visualization

Updated 6 months ago

ck • Rank 19.4 • Science 77%

Collective Knowledge (CK), Collective Mind (CM/CMX) and MLPerf automations: community-driven projects to facilitate collaborative and reproducible research and to learn how to run AI, ML, and other emerging workloads more efficiently and cost-effectively across diverse models, datasets, software, and hardware using MLPerf methodology and benchmarks

automation best-practices ck cknowledge cm cmind cmx collaboration ctuning education metadata mlops mlperf mlperf-automations mlperf-inference modularity optimization portability reusability workflows

Updated 6 months ago

kubeflow-katib • Rank 20.7 • Science 64%

Automated Machine Learning on Kubernetes

ai automl huggingface hyperparameter-tuning jax kubeflow kubernetes llm machine-learning mlops neural-architecture-search pytorch scikit-learn tensorflow

Updated 6 months ago

active-learning-as-a-service • Rank 6.8 • Science 77%

A scalable & efficient active learning/data selection system for everyone.

active-learning automl deep-learning machine-learning mlops mlsys pytorch

Updated 6 months ago

frouros • Rank 15.5 • Science 67%

Frouros: an open-source Python library for drift detection in machine learning systems.

change-detection concept-drift covariate-shift data-drift dataset-drift dataset-shift distribution-shift drift-detection machine-learning machine-learning-engineering machine-learning-operations mle mlops python statistics

Updated 6 months ago

bentoml • Rank 26.1 • Science 54%

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

ai-inference deep-learning generative-ai inference-platform llm llm-inference llm-serving llmops machine-learning ml-engineering mlops model-inference-service model-serving multimodal python

Updated 6 months ago

tensorzero • Rank 23.4 • Science 54%

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.

ai ai-engineering anthropic artificial-intelligence deep-learning genai generative-ai gpt large-language-models llama llm llmops llms machine-learning ml ml-engineering mlops openai python rust

Mathematics (40%)

Updated 6 months ago

deepchecks • Rank 22.8 • Science 54%

Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.

data-drift data-science data-validation deep-learning html-report jupyter-notebook machine-learning ml mlops model-monitoring model-validation pandas-dataframe python pytorch

Updated 6 months ago

cresset • Rank 8.0 • Science 67%

Template repository to build PyTorch projects from source on any version of PyTorch/CUDA/cuDNN.

build cuda deep-learning deep-learning-tutorial docker docker-compose machine-learning makefile mlops mlops-template python pytorch source source-python template template-repository wheel

Updated 6 months ago

mosec • Rank 20.4 • Science 54%

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

cv deep-learning gpu hacktoberfest jax llm llm-serving machine-learning machine-learning-platform mlops model-serving mxnet nerual-network python pytorch rust tensorflow tts

Updated 6 months ago

mlflow • Rank 35.0 • Science 36%

The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.

agentops agents ai ai-governance apache-spark evaluation langchain llm-evaluation llmops machine-learning ml mlflow mlops model-management observability open-source openai prompt-engineering

Engineering Earth and Environmental Sciences (40%)

Updated 6 months ago

kedro • Rank 14.8 • Science 54%

Kedro is a toolbox for production-ready data science. It uses software engineering best practices to help you create data engineering and data science pipelines that are reproducible, maintainable, and modular.

experiment-tracking hacktoberfest kedro machine-learning machine-learning-engineering mlops pipeline python

Updated 6 months ago

weaviate • Rank 24.2 • Science 44%

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

approximate-nearest-neighbor-search generative-search grpc hnsw hybrid-search image-search information-retrieval mlops nearest-neighbor-search neural-search recommender-system search-engine semantic-search semantic-search-engine similarity-search vector-database vector-search vector-search-engine vectors weaviate

Materials Science (40%)

Updated 6 months ago

neptune-client • Rank 24.0 • Science 44%

📘 The experiment tracker for foundation model training

comparison dl foundation keras learning lightgbm llm logger logging machine ml mlops monitoring optuna pytorch rl tensorflow versioning visualization xgboost

Updated 6 months ago

https://github.com/kubeflow/pipelines • Rank 31.5 • Science 36%

Machine Learning Pipelines for Kubeflow

data-science kubeflow kubeflow-pipelines kubernetes machine-learning mlops pipeline

Updated 6 months ago

https://github.com/netflix/metaflow • Rank 30.5 • Science 36%

Build, Manage and Deploy AI/ML Systems

agents ai aws azure data-science datascience gcp generative-ai high-performance-computing kubernetes llm llmops machine-learning ml ml-infrastructure ml-platform mlops model-management python

Updated 6 months ago

polyaxon • Rank 22.2 • Science 44%

MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle

artificial-intelligence caffe data-science deep-learning hyperparameter-optimization jupyter jupyterlab k8s keras kubernetes machine-learning ml mlops mxnet notebook pipelines pytorch reinforcement-learning tensorflow workflow

Updated 6 months ago

openllm • Rank 22.1 • Science 44%

Run any open-source LLMs, such as DeepSeek and Llama, as OpenAI compatible API endpoint in the cloud.

bentoml fine-tuning llama llama2 llama3-1 llama3-2 llama3-2-vision llm llm-inference llm-ops llm-serving llmops mistral mlops model-inference open-source-llm openllm vicuna

Updated 6 months ago

https://github.com/lancedb/lance • Rank 28.1 • Science 36%

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

apache-arrow computer-vision data-analysis data-analytics data-centric data-format data-science dataops deep-learning duckdb embeddings llms machine-learning mlops python rust

Updated 6 months ago

https://github.com/flyteorg/flyte • Rank 27.3 • Science 36%

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

data data-analysis data-science dataops declarative fine-tuning flyte golang grpc hacktoberfest kubernetes kubernetes-operator llm machine-learning mlops orchestration-engine production python scale workflow

Updated 6 months ago

popmon • Rank 17.7 • Science 44%

Monitor the stability of a Pandas or Spark dataframe ⚙︎

covariate-shift data-analysis data-distributions data-profiling data-science dataset-shifts drift-detection hacktoberfest ing-bank ipython jupyter mlops monitoring pandas population-monitoring python spark statistical-process-control statistical-tests statistics

Updated 6 months ago

deeplake • Rank 25.6 • Science 36%

Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai

ai computer-vision cv data-science datalake datasets deep-learning image-processing langchain large-language-models llm machine-learning ml mlops multi-modal python pytorch tensorflow vector-database vector-search

Updated 6 months ago

https://github.com/zenml-io/zenml • Rank 23.9 • Science 36%

ZenML 🙏: MLOps for Reliable AI: from Classical AI to Agents. https://zenml.io.

ai automl data-science deep-learning devops-tools hacktoberfest llm llmops machine-learning metadata-tracking ml mlops pipelines production-ready pytorch tensorflow workflow zenml

Updated 6 months ago

bpmn.ai-patterns • Rank 5.4 • Science 54%

Integration patterns - Using AI in business processes

artificial-intelligence best-practice bpmn data-science governance ki machine-learning mlops orchestration patterns

Updated 4 months ago

https://github.com/deepset-ai/haystack-core-integrations • Rank 22.9 • Science 36%

Additional packages (components, document stores and the likes) to extend the capabilities of Haystack

ai haystack llm mlops nlp

Updated 6 months ago

https://github.com/apache/airflow • Rank 18.8 • Science 36%

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

airflow apache apache-airflow automation dag data-engineering data-integration data-orchestrator data-pipelines data-science elt etl machine-learning mlops orchestration python scheduler workflow workflow-engine workflow-orchestration

Updated 6 months ago

state-of-open-source-ai • Rank 10.3 • Science 44%

:closed_book: Clarity in the current fast-paced mess of Open Source innovation

ai book hacktoberfest jupyter-book ml mlops open-source

Updated 6 months ago

https://github.com/featureform/featureform • Rank 16.8 • Science 36%

The Virtual Feature Store. Turn your existing data infrastructure into a feature store.

data-quality data-science embeddings embeddings-similarity feature-engineering feature-store hacktoberfest machine-learning ml mlops python vector-database

Updated 6 months ago

argilla • Rank 13.1 • Science 36%

Argilla is a collaboration tool for AI engineers and domain experts to build high-quality datasets

active-learning ai annotation-tool developer-tools gpt-4 human-in-the-loop langchain llm machine-learning mlops natural-language-processing nlp rlhf text-annotation text-labeling weak-supervision weakly-supervised-learning

Updated 6 months ago

https://github.com/apache/hamilton • Rank 12.1 • Science 36%

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

dag data-analysis data-engineering data-science dataframe etl etl-framework etl-pipeline feature-engineering hacktoberfest lineage llmops machine-learning mlops orchestration pandas python rag software-engineering

Updated 6 months ago

https://github.com/superduper-io/superduper • Rank 20.6 • Science 26%

Superduper: End-to-end framework for building custom AI applications and agents.

ai chatbot data database distributed-ml inference llm-inference llm-serving llmops ml mlops mongodb pretrained-models python pytorch rag semantic-search torch transformers vector-search

Updated 6 months ago

https://github.com/sematic-ai/sematic • Rank 19.2 • Science 26%

An open-source ML pipeline development platform

ai data-science machine-learning ml ml-ops ml-pipeline ml-pipelines mlops pipeline python python3

Updated 6 months ago

https://github.com/polyaxon/hypertune • Rank 18.3 • Science 26%

A library for performing hyperparameter optimization

data-science deep-learning hyperparameter-optimization hyperparameter-tuning machine-learning mlops numpy scikit-learn workflow

Updated 6 months ago

monai-deploy • Rank 8.0 • Science 36%

MONAI Deploy aims to become the de-facto standard for developing, packaging, testing, deploying and running medical AI applications in clinical production.

ai ai-application-deployment ai-application-development deep-learning dicom fhir guidelines healthcare inference machine-learning medical-imaging ml-platform mlops monai open-standard pathology python pytorch radiology

Updated 6 months ago

https://github.com/hongbo-miao/hongbomiao.com • Rank 17.3 • Science 26%

A personal research and development (R&D) lab that facilitates the sharing of knowledge.

aerospace cloud-native computational-fluid-dynamics computer-vision continuous-machine-learning distributed-tracing embedded graphql high-performance-computing infrastructure-as-code kubernetes llm matlab mlops neural-network robot-operating-system rust service-mesh transformer veristand

Updated 6 months ago

https://github.com/aiplanethub/genai-stack • Rank 11.1 • Science 31%

An End to End GenAI Framework

ai chatgpt data-engineering datascientist genai hacktoberfest hacktoberfest-accepted hacktoberfest2023 langchain llama llama-index llm llmops mlops

Updated 6 months ago

vetiver • Rank 15.1 • Science 26%

Version, share, deploy, and monitor models.

mlops model-deploy model-monitoring model-versioning python

Updated 6 months ago

https://github.com/thebabylonai/babylog • Rank 9.1 • Science 23%

A lightweight logger for machine learning teams to log images and predictions in production.

computer-vision cvops data-science logger logging-library machine-learning ml mlops python python3

Updated 6 months ago

https://github.com/thenewflesh/hidebound • Rank 5.7 • Science 26%

Hidebound is massive, distributed digital asset management system for ML pipelines on Kubernetes

asset-management computer-vision data-science machine-learning mlops mlops-workflow vfx vfx-pipeline

Updated 6 months ago

https://github.com/neptune-ai/neptune-notebooks • Rank 14.9 • Science 13%

📚 Jupyter Notebooks extension for versioning, managing and sharing notebook checkpoints in your machine learning and data science projects.

cli collaboration experiment jupyterlab ml mlops production registry research snapshot snapshotting team tracking version

Updated 6 months ago

https://github.com/ploomber/soorgeon • Rank 14.4 • Science 13%

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊

data-engineering data-science jupyter jupyter-notebooks machine-learning mlops workflow

Updated 6 months ago

https://github.com/whylabs/whylogs • Rank 13.4 • Science 13%

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈

ai-pipelines analytics approximate-statistics calculate-statistics constraints data-constraints data-pipeline data-quality data-science dataops dataset logging machine-learning ml-pipelines mlops model-performance python statistical-properties

Updated 5 months ago

https://github.com/bentoml/bentoctl • Rank 11.9 • Science 13%

Fast model deployment on any cloud 🚀

aws aws-deployment aws-lambda azure azure-deployment gcp gcp-deployment heroku-deployment mlops mlops-workflow model-deployment serverless

Updated 6 months ago

https://github.com/raptor-ml/raptor • Rank 10.7 • Science 13%

Transform your pythonic research to an artifact that engineers can deploy easily.

ai-infra data-engineering data-science dataops feature-engineering feature-extraction feature-platform featurestore kubeflow kubernetes machine-learning ml mlops model-deployment production raptor raptor-ml reactive-ml

Updated 5 months ago

https://github.com/bentoml/clip-api-service • Rank 9.2 • Science 13%

CLIP as a service - Embed image and sentences, object recognition, visual reasoning, image classification and reverse image search

ai-applications clip cloud-native mlops model-inference model-inference-service model-serving openai-clip

Updated 6 months ago

https://github.com/agnostiqhq/covalent-cloud-github-workflow • Rank 3.2 • Science 13%

Template for integrating Covalent Cloud's high-performance computing capabilities into GitHub Workflows

automation ci-cd cicd covalent github-actions github-workflow gpu high-performance-computing hpc mlops python serverless

Updated 6 months ago

https://github.com/agnostiqhq/tutorials_covalent_mlops_2022 • Rank 0.7 • Science 13%

Covalent tutorial for MLOps 2022

covalent covalent-tutorial distributed-computing hpc ml ml-tutorial mlops pytorch

Updated 6 months ago

trustpy-tools • Science 44%

TrustPy is a production-ready Python package purpose-built for MLOps pipelines—enabling automated, interpretable analysis of model trustworthiness and predictive reliability before deployment. Available via Conda-Forge and PyPI, with full CI/CD integration and seamless compatibility across modern ML stacks.

ai machine-learning-algorithms mlops python

Updated 5 months ago

https://github.com/awslabs/aiops-modules • Science 26%

AIOps modules is a collection of reusable Infrastructure as Code (IaC) modules for Machine Learning (ML), Foundation Models (FM), Large Language Models (LLM) and GenAI development and operations on AWS

aiops aws bedrock fmops genai llmops mlflow mlops sagemaker

Updated 5 months ago

https://github.com/bentoml/plugins • Science 13%

the swish knife to all things bentoml.

bazel bentoml mlops nix

Updated 5 months ago

https://github.com/awslabs/kubeflow-manifests • Science 13%

KubeFlow on AWS

aws data-science eks kubeflow kubernetes mlops

Updated 5 months ago

https://github.com/amr-yasser226/customer-churn-prediction • Science 26%

End-to-end customer churn prediction project: dataset preparation, experiments with scikit-learn, model tracking with MLflow, data versioning (DVC), CI/CD, and deployment examples.

churn-prediction classification data-versioning docker jupyter-notebook machine-learning mlflow mlops pytest python scikit-learn

Updated 6 months ago

https://github.com/adalkiran/distributed-inference • Science 13%

A project to demonstrate an approach to designing cross-language and distributed pipeline in deep learning/machine learning domain, using WebRTC and Redis Streams.

cross-language deep-learning distributed distributed-systems go golang machine-learning ml mlops onnx onnxruntime pion python redis redis-streams video-processing video-processing-pipeline webrtc yolo yolox

Updated 6 months ago

deeptsf • Science 57%

The DeepTSF time series forecasting repository developed by EPU NTUA within the DeployAI project

darts deep-learning docker machine-learning mlflow mlops time-series-forecasting

Updated 6 months ago

naomi • Science 67%

NAOMI: Network AI Workflow Democratization

kubernetes machine-learning ml-automation mlops orchestration workflow workflow-automation

Updated 6 months ago

awesome-production-machine-learning • Science 62%

A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning

awesome awesome-list data-mining deep-learning explainability interpretability large-scale-machine-learning large-scale-ml machine-learning machine-learning-operations ml-operations ml-ops mlops privacy-preserving privacy-preserving-machine-learning privacy-preserving-ml production-machine-learning production-ml responsible-ai

Updated 6 months ago

csc-mlops • Science 44%

Framework for building ML apps

artificial-intelligence data-science machine-learning mlops

Updated 5 months ago

https://github.com/amilworks/argocd-demo • Science 13%

Guide to Getting Started with ArgoCD

argocd kubernetes mlops

Updated 5 months ago

https://github.com/bentoml/transformers-nlp-service • Science 13%

Online Inference API for NLP Transformer models - summarization, text classification, sentiment analysis and more

llm llmops mlops model-deployment model-inference-service model-serving nlp nlp-machine-learning online-inference transformer

Updated 6 months ago

https://github.com/kruskal-labs/toolfront • Science 26%

Data retrieval for AI agents

agent analytics artificial-intelligence bigquery data-analysis data-engineering data-science database databricks dataops information-extraction information-retrieval machine-learning mcp mlops mysql python snowflake sql sqlite

Updated 6 months ago

langtest • Science 54%

Deliver safe & effective language models

ai-safety ai-testing artificial-intelligence benchmark-framework benchmarks ethics-in-ai large-language-models llm llm-as-evaluator llm-evaluation-toolkit llm-test llm-testing ml-safety ml-testing mlops model-assessment nlp responsible-ai trustworthy-ai

Updated 6 months ago

glide • Science 44%

🐦 A open blazing-fast simple model gateway for rapid development of production GenAI apps

ai gateway gateway-api genai generative-ai glide go llm llmops ml mlops router

Updated 6 months ago

agilerl • Science 54%

Streamlining reinforcement learning with RLOps. State-of-the-art RL algorithms and tools, with 10x faster training through evolutionary hyperparameter optimization.

agilerl automl deep-learning deep-reinforcement-learning distributed evolutionary-algorithms gym hpo hyperparameter-optimization hyperparameter-tuning machine-learning mlops multi-agent multi-agent-reinforcement-learning pettingzoo python pytorch reinforcement-learning rlops training