Retriever
Retriever: Data Retrieval Tool - Published in JOSS (2017)
open-mastr
open-mastr: A Python Package to Download and Process the German Energy Registry Marktstammdatenregister - Published in JOSS (2024)
Rdataretriever
Rdataretriever: R Interface to the Data Retriever - Published in JOSS (2021)
terrainr
terrainr: An R package for creating immersive virtual environments - Published in JOSS (2022)
Foundry-ML - Software and Services to Simplify Access to Machine Learning Datasets in Materials Science
Foundry-ML - Software and Services to Simplify Access to Machine Learning Datasets in Materials Science - Published in JOSS (2024)
Jury
Jury: A Comprehensive Evaluation Toolkit - Published in JOSS (2024)
Git-RDM
Git-RDM: A research data management plugin for the Git version control system - Published in JOSS (2016)
datasets
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
spectrochempy
SpectroChemPy is a framework for processing, analyzing and modeling spectroscopic data for chemistry with Python
folio
Datasets for Teaching Archaeology and Palaeontology - :exclamation: This is a read-only mirror from https://codeberg.org/tesselle/folio
dataset-phenotypes
Preparatory scripts for BIDS tabular phenotypic data in large neuroimaging datasets.
loghub
A large collection of system log datasets for AI-driven log analytics [ISSRE'23]
minari
A standard format for offline reinforcement learning datasets, with popular reference datasets and related utilities
https://github.com/bioinfomachinelearning/dips-plus
The Enhanced Database of Interacting Protein Structures for Interface Prediction
awesome-transit
Community list of transit APIs, apps, datasets, research, and software :bus::star2::train::star2::steam_locomotive:
awesome-forests
🌳 A curated list of ground-truth forest datasets for the machine learning and forestry community.
@stdlib/datasets-spache-revised
A list of simple American-English words (revised Spache).
deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
@stdlib/datasets-harrison-boston-house-prices
A dataset derived from information collected by the US Census Service concerning housing in Boston, Massachusetts (1978).
datasets-herndon-venus-semidiameters
Fifteen observations of the vertical semidiameter of Venus, made by Lieutenant Herndon, with the meridian circle at Washington, in the year 1846.
@stdlib/datasets-harrison-boston-house-prices-corrected
A (corrected) dataset derived from information collected by the US Census Service concerning housing in Boston, Massachusetts (1978).
@stdlib/datasets-pace-boston-house-prices
A (corrected) dataset derived from information collected by the US Census Service concerning housing in Boston, Massachusetts (1978).
datasets-minard-napoleons-march
Data for Charles Joseph Minard's cartographic depiction of Napoleon's Russian campaign of 1812.
cihai
Python library for CJK (Chinese, Japanese, and Korean) language dictionary
https://github.com/openneuroorg/openneuro
A free and open platform for analyzing and sharing neuroimaging data
obp
Open Bandit Pipeline: a python library for bandit algorithms and off-policy evaluation
awesome-earth-artificial-intelligence
A curated list of Earth Science's Artificial Intelligence (AI) tutorials, notebooks, software, datasets, courses, books, video lectures and papers. Contributions most welcome.
@stdlib/datasets-us-states-abbr
A list of US state two-letter abbreviations in alphabetical order according to state name.
@stdlib/datasets-cdc-nchs-us-infant-mortality-bw-1915-2013
US infant mortality data, by race, from 1915 to 2013, as provided by the Center for Disease Control and Prevention's National Center for Health Statistics.
@stdlib/datasets-cdc-nchs-us-births-1994-2003
US birth data from 1994 to 2003, as provided by the Center for Disease Control and Prevention's National Center for Health Statistics.
@stdlib/datasets-cdc-nchs-us-births-1969-1988
US birth data from 1969 to 1988, as provided by the Center for Disease Control and Prevention's National Center for Health Statistics.
@stdlib/datasets-nightingales-rose
Dataset for Nightingale's famous polar area diagram.
@stdlib/datasets-suthaharan-single-hop-sensor-network
Labeled wireless sensor network data set collected from a simple single-hop wireless sensor network deployment using TelosB motes.
@stdlib/datasets-fivethirtyeight-ffq
FiveThirtyEight reader responses to a food frequency questionnaire (FFQ).
@stdlib/datasets-suthaharan-multi-hop-sensor-network
Labeled wireless sensor network data set collected from a multi-hop wireless sensor network deployment using TelosB motes.
@stdlib/datasets-ssa-us-births-2000-2014
US birth data from 2000 to 2014, as provided by the Social Security Administration.
databot
This platform is used to create bots whose job is to answer questions about a specific data source. It allows the automatic generation of a chat/voice bot swarm to attend all the data sources in an Open Data Portal.
napolab
The Natural Portuguese Language Benchmark (Napolab). Stay up to date with the latest advancements in Portuguese language models and their performance across carefully curated Portuguese language tasks.
usfertilizer
An R package to retrieve the county-level fertilizer estimate data from 1945 to 2012 in USA.
Callisto-Dataset-Collection
A list of datasets aiming to enable Artificial Intelligence applications that use Copernicus data.
wildlife-datasets
WildlifeDatasets: An open-source toolkit for animal re-identification
NYISOToolkit
Access data, statistics, and visualizations for New York's electricity grid.