seaborn
seaborn: statistical data visualization - Published in JOSS (2021)
lifelines
lifelines: survival analysis in Python - Published in JOSS (2019)
MLxtend
MLxtend: Providing machine learning and data science utilities and extensions to Python's scientific computing stack - Published in JOSS (2018)
Flux
Flux: Elegant machine learning with Julia - Published in JOSS (2018)
imodels
imodels: a python package for fitting interpretable models - Published in JOSS (2021)
PyCM
PyCM: Multiclass confusion matrix library in Python - Published in JOSS (2018)
anndata
anndata: Access and store annotated data matrices - Published in JOSS (2024)
Retriever
Retriever: Data Retrieval Tool - Published in JOSS (2017)
The drake R package
The drake R package: a pipeline toolkit for reproducibility and high-performance computing - Published in JOSS (2018)
The targets R package
The targets R package: a dynamic Make-like function-oriented pipeline toolkit for reproducibility and high-performance computing - Published in JOSS (2021)
Learning from Crowds with Crowd-Kit
Learning from Crowds with Crowd-Kit - Published in JOSS (2024)
Feature-engine
Feature-engine: A Python package for feature engineering for machine learning - Published in JOSS (2021)
mlr3
mlr3: A modern object-oriented machine learning framework in R - Published in JOSS (2019)
Machine Learning Validation via Rational Dataset Sampling with astartes
Machine Learning Validation via Rational Dataset Sampling with astartes - Published in JOSS (2023)
Visions
Visions: An Open-Source Library for Semantic Data - Published in JOSS (2020)
Ripser.py
Ripser.py: A Lean Persistent Homology Library for Python - Published in JOSS (2018)
PyClustering
PyClustering: Data Mining Library - Published in JOSS (2019)
DeepRiver
DeepRiver: A Deep Learning Library for Data Streams - Published in JOSS (2025)
traffic, a toolbox for processing and analysing air traffic data
traffic, a toolbox for processing and analysing air traffic data - Published in JOSS (2019)
Pyglmnet
Pyglmnet: Python implementation of elastic-net regularized generalized linear models - Published in JOSS (2020)
Sciris
Sciris: Simplifying scientific software in Python - Published in JOSS (2023)
Rdataretriever
Rdataretriever: R Interface to the Data Retriever - Published in JOSS (2021)
tpcp
tpcp: Tiny Pipelines for Complex Problems - A set of framework independent helpers for algorithms development and evaluation - Published in JOSS (2023)
VeridicalFlow
VeridicalFlow: a Python package for building trustworthy data science pipelines with PCS - Published in JOSS (2022)
Astronomical échelle spectroscopy data analysis with `muler`
Astronomical échelle spectroscopy data analysis with `muler` - Published in JOSS (2022)
PyXAB - A Python Library for $\mathcal{X}$-Armed Bandit and Online Blackbox Optimization Algorithms
PyXAB - A Python Library for $\mathcal{X}$-Armed Bandit and Online Blackbox Optimization Algorithms - Published in JOSS (2024)
NiaARM
NiaARM: A minimalistic framework for Numerical Association Rule Mining - Published in JOSS (2022)
Foundry-ML - Software and Services to Simplify Access to Machine Learning Datasets in Materials Science
Foundry-ML - Software and Services to Simplify Access to Machine Learning Datasets in Materials Science - Published in JOSS (2024)
SysIdentPy
SysIdentPy: A Python package for System Identification using NARMAX models - Published in JOSS (2020)
Multiblock PLS
Multiblock PLS: Block dependent prediction modeling for Python - Published in JOSS (2019)
yadg
yadg: yet another datagram - Published in JOSS (2022)
pyrolite
pyrolite: Python for geochemistry - Published in JOSS (2020)
TorchMetrics - Measuring Reproducibility in PyTorch
TorchMetrics - Measuring Reproducibility in PyTorch - Published in JOSS (2022)
ChainoPy
ChainoPy: A Python Library for Discrete Time Markov Chain Based Stochastic Analysis - Published in JOSS (2024)
Nominally
Nominally: A Name Parser for Record Linkage - Published in JOSS (2021)
BetaML
BetaML: The Beta Machine Learning Toolkit, a self-contained repository of Machine Learning algorithms in Julia - Published in JOSS (2021)
HRV
HRV: a Pythonic package for Heart Rate Variability Analysis - Published in JOSS (2020)
UKCensusAPI
UKCensusAPI: python and R interfaces to the nomisweb UK census data API - Published in JOSS (2017)
TDAstats
TDAstats: R pipeline for computing persistent homology in topological data analysis - Published in JOSS (2018)
IKPLS
IKPLS: Improved Kernel Partial Least Squares and Fast Cross-Validation Algorithms for Python with CPU and GPU Implementations Using NumPy and JAX - Published in JOSS (2024)
HiPart
HiPart: Hierarchical Divisive Clustering Toolbox - Published in JOSS (2023)
nimCSO
nimCSO: A Nim package for Compositional Space Optimization - Published in JOSS (2024)
Synthia
Synthia: multidimensional synthetic data generation in Python - Published in JOSS (2021)
MarSwitching.jl
MarSwitching.jl: A Julia package for Markov switching dynamic models - Published in JOSS (2024)
scikit-hubness
scikit-hubness: Hubness Reduction and Approximate Neighbor Search - Published in JOSS (2020)
visxhclust
visxhclust: An R Shiny package for visual exploration of hierarchical clustering - Published in JOSS (2022)
Syd
Syd: A package for making interactive data visualizations in python - Published in JOSS (2025)
Distributions
A Julia package for probability distributions and associated functions.
Leafmap
Leafmap: A Python package for interactive mapping and geospatial analysis with minimal coding in a Jupyter environment - Published in JOSS (2021)
MLMOD
MLMOD: Machine Learning Methods for Data-Driven Modeling in LAMMPS - Published in JOSS (2023)
pandas
Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
latentcor
latentcor: An R Package for estimating latent correlations from mixed data types - Published in JOSS (2021)
ConTEXT Explorer
ConTEXT Explorer: a web-based text analysis tool for exploring and visualizing concepts across time - Published in JOSS (2021)
pypots
A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values
plotastic
plotastic: Bridging Plotting and Statistics in Python - Published in JOSS (2024)
skpro
A unified framework for tabular probabilistic regression, time-to-event prediction, and probability distributions in python
Multivariate Covariance Generalized Linear Models in Python
Multivariate Covariance Generalized Linear Models in Python: The mcglm library - Published in JOSS (2024)
flaml
A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.
heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
srai
Spatial Representations for Artificial Intelligence - a Python library toolkit for geospatial machine learning focused on creating embeddings for downstream tasks
DEEPaaS API
DEEPaaS API: a REST API for Machine Learning and Deep Learning models - Published in JOSS (2019)
statsmodels
Statsmodels: statistical modeling and econometrics in Python
STUMPY
STUMPY: A Powerful and Scalable Python Library for Time Series Data Mining - Published in JOSS (2019)
pytorch-lightning
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
skforecast
Time series forecasting with machine learning models
pydata-wrangler
Wrangle messy numerical, image, and text data into consistent well-organized formats
collapse
Advanced and Fast Data Transformation in R
https://github.com/theislab/cellrank
CellRank: dynamics from multi-view single-cell data
https://github.com/recommenders-team/recommenders
Best Practices on Recommendation Systems
susi
SuSi: Python package for unsupervised, supervised and semi-supervised self-organizing maps (SOM)
lexicalrichness
:smile_cat: :speech_balloon: A module to compute textual lexical richness (aka lexical diversity).
geemap
geemap: A Python package for interactive mapping with Google Earth Engine - Published in JOSS (2020)
tpot
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
forestplot
A Python package to make publication-ready but customizable coefficient plots.
gradio
Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!
glycowork
Package for processing and analyzing glycans and their role in biology.
https://github.com/alan-turing-institute/clevercsv
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
spectrafit
📊📈🔬 SpectraFit is a command-line and Jupyter-notebook tool for quick data-fitting based on the regular expression of distribution functions.