Scientific Software
Updated 10 months ago

DataLad — Peer-reviewed • Rank 21.4 • Science 100%

DataLad: distributed system for joint management of code, data, and their relationship - Published in JOSS (2021)

Scientific Software · Peer-reviewed
Scientific Software
Updated 10 months ago

Retriever — Peer-reviewed • Rank 18.2 • Science 100%

Retriever: Data Retrieval Tool - Published in JOSS (2017)

Scientific Software · Peer-reviewed
Scientific Software
Updated 10 months ago

Soundata — Peer-reviewed • Rank 17.0 • Science 100%

Soundata: Reproducible use of audio datasets - Published in JOSS (2024)

Scientific Software · Peer-reviewed
Scientific Software
Updated 10 months ago

open-mastr — Peer-reviewed • Rank 16.6 • Science 100%

open-mastr: A Python Package to Download and Process the German Energy Registry Marktstammdatenregister - Published in JOSS (2024)

Political Science
Scientific Software · Peer-reviewed
Scientific Software
Updated 10 months ago

Crowsetta — Peer-reviewed • Rank 13.3 • Science 93%

Crowsetta: A Python tool to work with any format for annotating animal vocalizations and bioacoustics data. - Published in JOSS (2023)

Scientific Software
Updated 10 months ago

WGS2NCBI - Toolkit for preparing genomes for submission to NCBI — Peer-reviewed • Rank 5.0 • Science 93%

WGS2NCBI - Toolkit for preparing genomes for submission to NCBI - Published in JOSS (2019)

Scientific Software · Peer-reviewed
Scientific Software
Updated 10 months ago

tsp ("Teaspoon") — Peer-reviewed • Rank 8.8 • Science 87%

tsp ("Teaspoon"): A library for ground temperature data - Published in JOSS (2022)

Engineering (40%)
Scientific Software · Peer-reviewed
Updated 10 months ago

faker • Rank 34.4 • Science 54%

Faker is a Python package that generates fake data for you.

Updated 10 months ago

ekpmeasure • Rank 7.4 • Science 77%

Repository of analysis and computer control code for various experiments. Analysis module is designed to help the researcher wrangle large amounts of meta data

Updated 10 months ago

tlidb • Rank 6.3 • Science 77%

Transfer Learning in Dialogue Benchmarking Toolkit

Updated 10 months ago

mtg-jamendo-dataset • Rank 8.1 • Science 72%

Metadata, scripts and baselines for the MTG-Jamendo dataset

Updated 10 months ago

tiny_qa_benchmark_pp • Rank 2.1 • Science 77%

Tiny QA Benchmark++ a micro-benchmark suite (52-item gold + on-demand multilingual synthetic packs), generator CLI, and CI-ready eval harness for ultra-fast LLM smoke-testing & regression-catching.

Updated 10 months ago

proteinworkshop • Rank 11.6 • Science 67%

Benchmarking framework for protein representation learning. Includes a large number of pre-training and downstream task datasets, models and training/task utilities. (ICLR 2024)

Updated 10 months ago

dataset-phenotypes • Rank 4.9 • Science 72%

Preparatory scripts for BIDS tabular phenotypic data in large neuroimaging datasets.

Updated 10 months ago

py-torchtext • Rank 30.7 • Science 46%

Models, data loaders and abstractions for language processing, powered by PyTorch

Updated 10 months ago

sdnist • Rank 11.5 • Science 65%

SDNist: Benchmark data and evaluation tools for data synthesizers.

Updated 10 months ago

auk • Rank 18.5 • Science 57%

Working with eBird data in R

Updated 10 months ago

vulntrain • Rank 7.6 • Science 67%

A tool to generate datasets and models based on vulnerabilities descriptions from @Vulnerability-Lookup.

Updated 10 months ago

knowprompt • Rank 6.4 • Science 67%

[WWW 2022] KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction

Updated 10 months ago

globalmatch • Rank 5.4 • Science 64%

GlobalMatch: Registration of forest terrestrial point clouds by global matching of relative stem positions [ISPRS 2023]

Updated 10 months ago

awesome-remote-sensing-change-detection • Rank 9.6 • Science 59%

List of datasets, codes, and contests related to remote sensing change detection

Updated 10 months ago

noisy-sentences-dataset • Rank 0.7 • Science 67%

550K sentences in 5 European languages augmented with noise for training and evaluating spell correction tools or machine learning models.

Updated 10 months ago

yegor256/cam • Rank 12.6 • Science 54%

Classes and Metriсs (CaM): a dataset of Java classes from public open-source GitHub repositories

Updated 10 months ago

eurocrops • Rank 7.6 • Science 59%

The official repository for the EuroCrops dataset.

Updated 10 months ago

rdhs • Rank 6.5 • Science 59%

API Client and Data Munging for the Demographic and Health Survey Data

Updated 10 months ago

transformer-srl • Rank 10.4 • Science 54%

Reimplementation of a BERT based model (Shi et al, 2019), currently the state-of-the-art for English SRL. This model implements also predicate disambiguation.

Updated 10 months ago

phishing-dataset • Rank 5.3 • Science 57%

Phishing dataset with more than 88,000 instances and 111 features. Web application available at. https://gregavrbancic.github.io/Phishing-Dataset/

Updated 10 months ago

botbots • Rank 5.1 • Science 57%

A dataset featuring diverse dialogues between two ChatGPT (gpt-3.5-turbo) instances with system messages written by GPT-4. Covering various contexts and tasks (task-oriented dialogue systems, abstract reasoning, brainstorming).

Updated 10 months ago

monitors4codegen • Rank 7.0 • Science 54%

Code and Data artifact for NeurIPS 2023 paper - "Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context". `multispy` is a lsp client library in Python intended to be used to build applications around language servers.

Updated 10 months ago

@stdlib/datasets-harrison-boston-house-prices • Rank 3.8 • Science 57%

A dataset derived from information collected by the US Census Service concerning housing in Boston, Massachusetts (1978).

Updated 10 months ago

datasets-herndon-venus-semidiameters • Rank 3.8 • Science 57%

Fifteen observations of the vertical semidiameter of Venus, made by Lieutenant Herndon, with the meridian circle at Washington, in the year 1846.

Updated 10 months ago

@stdlib/datasets-harrison-boston-house-prices-corrected • Rank 3.0 • Science 57%

A (corrected) dataset derived from information collected by the US Census Service concerning housing in Boston, Massachusetts (1978).

Updated 10 months ago

@stdlib/datasets-pace-boston-house-prices • Rank 2.7 • Science 57%

A (corrected) dataset derived from information collected by the US Census Service concerning housing in Boston, Massachusetts (1978).

Updated 10 months ago

datasets-minard-napoleons-march • Rank 1.8 • Science 57%

Data for Charles Joseph Minard's cartographic depiction of Napoleon's Russian campaign of 1812.

Updated 10 months ago

open-data-on-github • Rank 4.5 • Science 54%

Dataset files for the Open Data on GitHub paper

Updated 10 months ago

yeast-in-microstructures-dataset • Rank 3.7 • Science 54%

Official and maintained implementation of the dataset paper "An Instance Segmentation Dataset of Yeast Cells in Microstructures" [EMBC 2023].

Updated 10 months ago

aptv2 • Rank 3.3 • Science 54%

The official repo for the extension of [NeurIPS'22] "APT-36K: A Large-scale Benchmark for Animal Pose Estimation and Tracking": https://github.com/pandorgan/APT-36K

Updated 10 months ago

eurocropsml • Rank 8.1 • Science 49%

EuroCropsML is a ready-to-use benchmark dataset for few-shot crop type classification using Sentinel-2 imagery.

Updated 10 months ago

pyslice • Rank 2.5 • Science 54%

Data set templating library for model dataset creation and model running.

Updated 10 months ago

tyc-dataset • Rank 2.3 • Science 54%

Official and maintained implementation of the dataset paper "The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures" [ICCVW 2023].

Updated 10 months ago

fitz-collection-raw-data • Rank 2.3 • Science 54%

Raw data from the collections database in json and csv format

Updated 10 months ago

lakes_temp • Rank 0.0 • Science 54%

Lakes temperature analysis based on satellite images

Updated 10 months ago

TACO • Rank 7.7 • Science 46%

🌮 Trash Annotations in Context Dataset Toolkit

Updated 10 months ago

FAIRshare • Rank 2.6 • Science 51%

Simplifying the curation and sharing of biomedical research data and software according to applicable FAIR guidelines

Updated 10 months ago

ClimateSERVpy • Rank 7.4 • Science 46%

This is a package to access the ClimateSERV API

Scientific Software
Updated 10 months ago

citesdb — Peer-reviewed • Rank 4.2 • Science 49%

citesdb: An R package to support analysis of CITES Trade Database shipment-level data - Published in JOSS (2019)

Scientific Software · Peer-reviewed
Updated 10 months ago

https://github.com/atomashevic/pymadoc • Rank 4.0 • Science 49%

Python package to download and combine parts of MADOC dataset

Updated 10 months ago

thetis • Rank 8.3 • Science 44%

Service to examine data processing pipelines (e.g., machine learning or deep learning pipelines) for uncertainty consistency (calibration), fairness, and other safety-relevant aspects.

Updated 10 months ago

torch-waymo • Rank 7.3 • Science 44%

PyTorch dataloader for Waymo Open Dataset

Updated 10 months ago

@stdlib/datasets-us-states-abbr • Rank 6.7 • Science 44%

A list of US state two-letter abbreviations in alphabetical order according to state name.

Updated 10 months ago

cppe5 • Rank 8.8 • Science 41%

Code for our paper CPPE - 5 (Medical Personal Protective Equipment), a new challenging object detection dataset

Updated 10 months ago

psidr • Rank 16.7 • Science 33%

R package to easily build panel data sets from the PSID

Updated 10 months ago

netcdf-fortran • Rank 13.6 • Science 36%

Official GitHub repository for netCDF-Fortran libraries, which depend on the netCDF C library. Install the netCDF C library first.

Updated 10 months ago

li_etl • Rank 2.9 • Science 46%

Deduplicated and enriched merge of the EDH and EDCS dataset

Updated 10 months ago

@stdlib/datasets-cdc-nchs-us-infant-mortality-bw-1915-2013 • Rank 4.5 • Science 44%

US infant mortality data, by race, from 1915 to 2013, as provided by the Center for Disease Control and Prevention's National Center for Health Statistics.

Updated 10 months ago

quick-torch • Rank 4.4 • Science 44%

Library that provides a QuickDraw dataset using the Pytorch API.

Updated 10 months ago

@stdlib/datasets-cdc-nchs-us-births-1994-2003 • Rank 4.4 • Science 44%

US birth data from 1994 to 2003, as provided by the Center for Disease Control and Prevention's National Center for Health Statistics.

Updated 10 months ago

@stdlib/datasets-cdc-nchs-us-births-1969-1988 • Rank 4.3 • Science 44%

US birth data from 1969 to 1988, as provided by the Center for Disease Control and Prevention's National Center for Health Statistics.