Pooch
Pooch: A friend to fetch your data files - Published in JOSS (2020)
PyCM
PyCM: Multiclass confusion matrix library in Python - Published in JOSS (2018)
Retriever
Retriever: Data Retrieval Tool - Published in JOSS (2017)
Gibbs Sea Water Oceanographic Toolbox of TEOS-10 implemented in Rust
Gibbs Sea Water Oceanographic Toolbox of TEOS-10 implemented in Rust - Published in JOSS (2024)
datawizard
datawizard: An R Package for Easy Data Preparation and Statistical Transformations - Published in JOSS (2022)
Rdataretriever
Rdataretriever: R Interface to the Data Retriever - Published in JOSS (2021)
dython: A Set of Analysis and Visualization Tools for Data and Variables in Python
dython: A Set of Analysis and Visualization Tools for Data and Variables in Python - Published in JOSS (2025)
OpenTripPlanner for R
OpenTripPlanner for R - Published in JOSS (2019)
ytree
ytree: A Python package for analyzing merger trees - Published in JOSS (2019)
gridwxcomp
gridwxcomp: A Python package to evaluate and interpolate biases between station and gridded weather data - Published in JOSS (2025)
wdpar
wdpar: Interface to the World Database on Protected Areas - Published in JOSS (2022)
weatherOz
weatherOz: An API Client for Australian Weather and Climate Data Resources in R - Published in JOSS (2024)
pydynpd
pydynpd: A Python package for dynamic panel model - Published in JOSS (2023)
NAIF PDS4 Bundler
NAIF PDS4 Bundler: A Python package to generate SPICE PDS4 archives - Published in JOSS (2022)
prisonbrief
prisonbrief: An R package that returns tidy data from the World Prison Brief website - Published in JOSS (2018)
covidregionaldata
covidregionaldata: Subnational data for COVID-19 epidemiology - Published in JOSS (2021)
pypillometry
pypillometry: A Python package for pupillometric analyses - Published in JOSS (2020)
Git-RDM
Git-RDM: A research data management plugin for the Git version control system - Published in JOSS (2016)
getTBinR
getTBinR: an R package for accessing and summarising the World Health Organisation Tuberculosis data - Published in JOSS (2019)
maidr-legacy
[DEPRECATED prototype] Multimodal Access and Interactive Data Representation
@uwdata/mosaic-core
An extensible framework for linking databases and interactive views.
pydata-wrangler
Wrangle messy numerical, image, and text data into consistent well-organized formats
pyprep
PyPREP: A Python implementation of the Preprocessing Pipeline (PREP) for EEG data
llama_index
LlamaIndex is the leading framework for building LLM-powered agents over your data.
spanishoddata
Access national high-quality and open-access datasets on movement patterns derived from mobile telephone datasets / Accede y usa datos nacionales abiertos sobre movimientos basados en teléfonos móviles.
datahugger
One downloader for many scientific data and code repositories! DOI :open_hands: Data
pandas-ai
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
asreview-makita
Workflow generator for simulation studies using the command line interface of ASReview LAB
academic-observatory-workflows
Telescopes, Workflows and Data Services for the Academic Observatory
interactive_data_editor
A Software to interactively edit data in a graphical manner
https://github.com/catalyst-cooperative/ferc-xbrl-extractor
A tool for converting FERC filings published in XBRL into SQLite databases
oaebu-workflows
Telescopes, Workflows and Data Services for the 'Book Analytics Dashboard Project (2022-2025)', building upon the project 'Developing a Pilot Data Trust for Open Access eBook Usage (2020-2022)'
enhancing_reaxff
Jupyter notebooks used for retraining the ReaxFF force field for the inorganic compound LiF.
cmstatr
cmstatr: An R Package for Statistical Analysis of Composite Material Data - Published in JOSS (2020)
cbsodata
Unofficial Statistics Netherlands (CBS) open data API client for Python
constrain
Control Strainer (ConStrain) is a data-driven knowledge-integrated framework that automatically verifies that building system controls function as intended.
compas
Main library of the COMPAS framework and CAD integrations for Rhino/GH and Blender.
urban-and-regional-planning-resources
Community list of data & technology resources concerning the built environment and communities. 🏙️🌳🚌🚦🗺️
odis-arch
Development of the Ocean Data and Information System (ODIS) architecture
https://github.com/modelscope/data-juicer
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
https://github.com/airbytehq/airbyte
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Spine-Toolbox
Spine Toolbox is an open source Python package to manage data, scenarios and workflows for modelling and simulation. You can have your local workflow, but work as a team through version control and SQL databases.
https://github.com/flyteorg/flyte
Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
https://github.com/ai-readi/fairhub-app
Web platform for easily managing, curating, and sharing FAIR and AI-ready clinical and biomedical research data
https://github.com/cidgoh/dataharmonizer
A standardized browser-based spreadsheet editor and validator that can be run offline and locally, and which includes templates for SARS-CoV-2 and Monkeypox sampling data. This project, created by the Centre for Infectious Disease Genomics and One Health (CIDGOH), at Simon Fraser University, is now an open-source collaboration with contributions from the National Microbiome Data Collaborative (NMDC), the LinkML development team, and others.
bio_data_guide
Standardizing Marine Biological Data Working Group - An open community to facilitate the mobilization of biological data to OBIS.
@stdlib/datasets-spache-revised
A list of simple American-English words (revised Spache).
pommesdata
A full-featured transparent data preparation routine from raw data to POMMES model inputs
cudem
CUDEM contains scripts, programs and API for use in generating and processing Digital Elevation Models