imodels
imodels: a python package for fitting interpretable models - Published in JOSS (2021)
PyCM
PyCM: Multiclass confusion matrix library in Python - Published in JOSS (2018)
Machine Learning Validation via Rational Dataset Sampling with astartes
Machine Learning Validation via Rational Dataset Sampling with astartes - Published in JOSS (2023)
VeridicalFlow
VeridicalFlow: a Python package for building trustworthy data science pipelines with PCS - Published in JOSS (2022)
Choice-Learn
Choice-Learn: Large-scale choice modeling for operational contexts through the lens of machine learning - Published in JOSS (2024)
BetaML
BetaML: The Beta Machine Learning Toolkit, a self-contained repository of Machine Learning algorithms in Julia - Published in JOSS (2021)
irl-imitation
Implementation of Inverse Reinforcement Learning (IRL) algorithms in Python/Tensorflow. Deep MaxEnt, MaxEnt, LPIRL
tensorzero
TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.
deepchecks
Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.
mlflow
The open source developer platform to build AI/LLM applications and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.
yggdrasil-decision-forests
A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
polyaxon
MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle
@stdlib/ml-incr-binary-classification
Incrementally perform binary classification using stochastic gradient descent (SGD).
https://github.com/biomedsciai/causallib
A Python package for modular causal inference analysis and model evaluations
deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
k-means-constrained
K-Means clustering - constrained with minimum and maximum cluster size. Documentation: https://joshlk.github.io/k-means-constrained
nerlnet
Nerlnet is a framework for research and development of distributed machine learning models on IoT
https://github.com/zenml-io/zenml
ZenML 🙏: MLOps for Reliable AI: from Classical AI to Agents. https://zenml.io.
structure-seer
The implementation, training and evaluation of a Structure Seer machine learning model designed for reconstruction of adjacency of a molecular graph from the labelling of its nodes.
iamai
A rule-driven comprehensive AI toolkit emphasizing simultaneous support for multimodal machine learning and the ability to construct cross-platform robots using logic.(规则驱动式的综合性人工智能工具库,强调同时支持多模态机器学习和利用逻辑构建跨平台机器人的能力)
state-of-open-source-ai
:closed_book: Clarity in the current fast-paced mess of Open Source innovation
https://github.com/csinva/csinva.github.io
Slides, paper notes, class notes, blog posts, and research on ML 📉, statistics 📊, and AI 🤖.
https://github.com/featureform/featureform
The Virtual Feature Store. Turn your existing data infrastructure into a feature store.
https://github.com/google/dopamine
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
@stdlib/ml-incr-sgd-regression
Online regression via stochastic gradient descent (SGD).
@stdlib/datasets-suthaharan-single-hop-sensor-network
Labeled wireless sensor network data set collected from a simple single-hop wireless sensor network deployment using TelosB motes.
@stdlib/datasets-suthaharan-multi-hop-sensor-network
Labeled wireless sensor network data set collected from a multi-hop wireless sensor network deployment using TelosB motes.
hierarchical-dnn-interpretations
Using / reproducing ACD from the paper "Hierarchical interpretations for neural network predictions" 🧠 (ICLR 2019)
netron
Visualizer for neural network, deep learning and machine learning models
https://github.com/superduper-io/superduper
Superduper: End-to-end framework for building custom AI applications and agents.
https://github.com/sematic-ai/sematic
An open-source ML pipeline development platform
rgf
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.
https://github.com/csinva/gan-vae-pretrained-pytorch
Pretrained GANs + VAEs + classifiers for MNIST/CIFAR in pytorch.
https://github.com/csinva/matching-with-gans
Matching in GAN latent space for better bias benchmarking and semantic image editing. 👶🏻🧒🏾👩🏼🦰👱🏽♂️👴🏾
https://github.com/csinva/disentangled-attribution-curves
Using / reproducing DAC from the paper "Disentangled Attribution Curves for Interpreting Random Forests and Boosted Trees"
https://github.com/csinva/analyzing-patient-perspectives
Analyzing interview data from the PediDOSE EFIC interviews using LLMs.
quickai
QuickAI is a Python library that makes it extremely easy to experiment with state-of-the-art Machine Learning models.
https://github.com/kyegomez/gemini
The open source implementation of Gemini, the model that will "eclipse ChatGPT" by Google
https://github.com/thebabylonai/babylog
A lightweight logger for machine learning teams to log images and predictions in production.
NYISOToolkit
Access data, statistics, and visualizations for New York's electricity grid.
https://github.com/drsoliddevil/mlr-gd
Multiple linear regression by gradient descent.
https://github.com/neptune-ai/neptune-notebooks
📚 Jupyter Notebooks extension for versioning, managing and sharing notebook checkpoints in your machine learning and data science projects.
https://github.com/rindow/rindow-neuralnetworks
Neural networks library for machine learning on PHP
https://github.com/csinva/gpt-paper-title-generator
Generating paper titles (and more!) with GPT trained on data scraped from arXiv.
https://github.com/csinva/dnn-ensemble
Testing the properties of ensembled neural networks.
https://github.com/raptor-ml/raptor
Transform your pythonic research to an artifact that engineers can deploy easily.
https://github.com/dair-ai/ml-nlp-paper-discussions
📄 A repo containing notes and discussions for our weekly NLP/ML paper discussions.
osdg-tool
OSDG is an open-source tool that maps and connects activities to the UN Sustainable Development Goals (SDGs) by identifying SDG-relevant content in any text. The tool is available online at www.osdg.ai. API access available for research purposes.
https://github.com/agnostiqhq/tutorials_covalent_mlops_2022
Covalent tutorial for MLOps 2022
https://github.com/csbiology/chlamyatlas
Chlamy Atlas is a AI-powered web application which predicts the localizations of proteins from the Green Algae Chlamydomonas reinhardtii.
insect-detect
Detection models and Python scripts for automated insect monitoring with the Insect Detect DIY camera trap.
machine-learning-responsible-python
Introduction to responsible machine learning with Python
https://github.com/materialsvirtuallab/matpes
A foundational potential energy dataset for materials
https://github.com/scimorph/secureml
Easy-to-use utilities to build privacy-preserving AI.
https://github.com/csinva/iprompt
Finding semantically meaningful and accurate prompts.
talktomodel
TalkToModel gives anyone with the powers of XAI through natural language conversations 💬!
https://github.com/ccomkhj/lightening_classifier
PyTorch Lightning wrapper to make training classifiers easier.
openspeaks-before-ai
A set of frameworks for creating the AI/ML building blocks for low-resource languages.
lecture-ai-basics
Course content for the elective Artificial Intelligence I, covering foundational AI concepts and applied exercises.
https://github.com/csinva/transformation-importance
Using / reproducing TRIM from the paper "Transformation Importance with Applications to Cosmology" 🌌 (ICLR Workshop 2020)
https://github.com/cool-japan/scirs
SciRS2 - Scientific Computing and AI in Rust
speech_data_ghana_ug
The dataset comprises of 5000 hours speech corpus in Akan, Ewe, Dagbani, Daagare, and Ikposo. Each language includes 1000 hours of audio speech from indigenous speakers of the language. Of which 100 hours is transcribed.
briscolabot
Reinforcement Learning agent that plays Briscola, a famous Italian card game
gnn_tracking
Reconstruct billions of particle trajectories with graph neural networks
https://github.com/agnostiqhq/tutorials_covalent_pydata_2023
Covalent tutorial notebooks and slides for PyData 2023, NYC