PyCM
PyCM: Multiclass confusion matrix library in Python - Published in JOSS (2018)
Missingno
Missingno: a missing data visualization suite - Published in JOSS (2018)
corner.py
corner.py: Scatterplot matrices in Python - Published in JOSS (2016)
Visions
Visions: An Open-Source Library for Semantic Data - Published in JOSS (2020)
Stingray 2
Stingray 2: A fast and modern Python library for spectral timing - Published in JOSS (2024)
FSharpGephiStreamer
FSharpGephiStreamer: An idiomatic bridge between F# and network visualization - Published in JOSS (2019)
piecewise-regression (aka segmented regression) in Python
piecewise-regression (aka segmented regression) in Python - Published in JOSS (2021)
uravu
uravu: Making Bayesian modelling easy(er) - Published in JOSS (2020)
Astronomical échelle spectroscopy data analysis with `muler`
Astronomical échelle spectroscopy data analysis with `muler` - Published in JOSS (2022)
THzTools
THzTools: data analysis software for terahertz time-domain spectroscopy - Published in JOSS (2024)
Cellpy – an open-source library for processing and analysis of battery testing data
Cellpy – an open-source library for processing and analysis of battery testing data - Published in JOSS (2024)
PyVBMC
PyVBMC: Efficient Bayesian inference in Python - Published in JOSS (2023)
TransBigData
TransBigData: A Python package for transportation spatio-temporal big data processing, analysis and visualization - Published in JOSS (2022)
ChainoPy
ChainoPy: A Python Library for Discrete Time Markov Chain Based Stochastic Analysis - Published in JOSS (2024)
PII-Codex
PII-Codex: a Python library for PII detection, categorization, and severity assessment - Published in JOSS (2023)
PyUnfold
PyUnfold: A Python package for iterative unfolding - Published in JOSS (2018)
spiketools
spiketools: a Python package for analyzing single-unit neural activity - Published in JOSS (2023)
PiSCAT
PiSCAT: A Python Package for Interferometric Scattering Microscopy - Published in JOSS (2022)
PIVA: Photoemission Interface for Visualization and Analysis
PIVA: Photoemission Interface for Visualization and Analysis - Published in JOSS (2025)
PatchView
PatchView: A Python Package for Patch-clamp Data Analysis and Visualization - Published in JOSS (2022)
Hypothesize
Hypothesize: Robust Statistics for Python - Published in JOSS (2020)
HiPart
HiPart: Hierarchical Divisive Clustering Toolbox - Published in JOSS (2023)
nimCSO
nimCSO: A Nim package for Compositional Space Optimization - Published in JOSS (2024)
ronswanson
ronswanson: Building Table Models for 3ML - Published in JOSS (2023)
visxhclust
visxhclust: An R Shiny package for visual exploration of hierarchical clustering - Published in JOSS (2022)
pypillometry
pypillometry: A Python package for pupillometric analyses - Published in JOSS (2020)
h3ppy
h3ppy: An open-source Python package for modelling and fitting H$_3^+$ spectra - Published in JOSS (2025)
ctbench - compile-time benchmarking and analysis
ctbench - compile-time benchmarking and analysis - Published in JOSS (2023)
pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
latentcor
latentcor: An R Package for estimating latent correlations from mixed data types - Published in JOSS (2021)
ConTEXT Explorer
ConTEXT Explorer: a web-based text analysis tool for exploring and visualizing concepts across time - Published in JOSS (2021)
tbeptools
tbeptools: An R package for synthesizing estuarine data for environmental research - Published in JOSS (2021)
pypots
A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values
plotastic
plotastic: Bridging Plotting and Statistics in Python - Published in JOSS (2024)
StatAid
StatAid: An R package with a graphical user interface for data analysis - Published in JOSS (2020)
A MATLAB toolbox to detect and analyze marine heatwaves
A MATLAB toolbox to detect and analyze marine heatwaves - Published in JOSS (2019)
statsmodels
Statsmodels: statistical modeling and econometrics in Python
pygwalker
PyGWalker: Turn your dataframe into an interactive UI for visual analysis
iris
A powerful, format-agnostic, and community-driven Python package for analysing and visualising Earth science data
pyerrors
Error propagation and statistical analysis for Monte Carlo simulations in lattice QCD and statistical mechanics using autograd.
pydata-wrangler
Wrangle messy numerical, image, and text data into consistent well-organized formats
collapse
Advanced and Fast Data Transformation in R
chip-atlas
ChIP-Atlas: Browse and analyze all public ChIP/DNase-seq data on your browser
spectrochempy
SpectroChemPy is a framework for processing, analyzing and modeling spectroscopic data for chemistry with Python
gradio
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
dimensio
Multivariate Data Analysis - :exclamation: This is a read-only mirror from https://codeberg.org/tesselle/dimensio
https://github.com/alan-turing-institute/clevercsv
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
pandas-ai
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
DIVAnd
DIVAnd performs an n-dimensional variational analysis of arbitrarily located observations
pygmmis
Gaussian mixture model for incomplete (missing or truncated) and noisy data
aspecd
Python framework for handling spectroscopic data focussing on reproducibility
https://github.com/zblz/naima
Derivation of non-thermal particle distributions through MCMC spectral fitting
cloupy
CLOUPY IS NO LONGER SUPPORTED. PLEASE, SEE README. cloupy is a Python library for climatological data downloading, processing and visualizing. The main goal of the library is to help its author in writing a BA thesis. The library is well adapted to academic work - used data sources are reliable and graphs are easy to modify.
pyglotaran
A Python library for Global and Target Analysis of time-resolved spectroscopy data
pyprobables
Probabilistic data structures in python http://pyprobables.readthedocs.io/en/latest/index.html
interactive_data_editor
A Software to interactively edit data in a graphical manner
public_open_source_data_science
A repository of open source data science projects for social good
imbalanced-learn
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
https://github.com/modelscope/data-juicer
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
ydata-profiling
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
https://github.com/airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
https://github.com/uxlfoundation/scikit-learn-intelex
Extension for Scikit-learn is a seamless way to speed up your Scikit-learn application
https://github.com/lancedb/lance
Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
https://github.com/growthbook/growthbook
Open Source Feature Flagging and A/B Testing Platform
https://github.com/flyteorg/flyte
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
qpcr
A python package to analyse qPCR data for single-use or high-throughput application
https://github.com/elementary-data/elementary
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
mastercurves
Python package for automatically superimposing data sets to create a master curve, using Gaussian process regression and maximum a posteriori estimation.
https://github.com/alleninstitute/openscope_databook
OpenScope databook: a collaborative, versioned, data-centric collection of foundational analyses for reproducible systems neuroscience 🐁🧠🔬🖥️📈
epiphyte
Python toolkit for working with high-dimensional neural data recorded during naturalistic, continuous stimuli @a-darcher @rachrapp
https://github.com/root-project/root
The official repository for ROOT: analyzing, storing and visualizing big data, scientifically