Updated 5 months ago

https://github.com/lancedb/lance • Rank 28.1 • Science 36%

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..

Updated 5 months ago

https://github.com/elementary-data/elementary • Rank 26.2 • Science 36%

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

Updated 5 months ago

https://github.com/awslabs/aws-ddk • Rank 15.5 • Science 26%

An open source development framework to help you build data workflows and modern data architecture on AWS.

Updated 5 months ago

https://github.com/whylabs/whylogs • Rank 13.4 • Science 13%

An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈

Updated 5 months ago

ixplorer • Rank 12.2 • Science 13%

Friendly DataOps with RStudio