PyExperimenter
PyExperimenter: Easily distribute experiments and track results - Published in JOSS (2023)
Rdataretriever
Rdataretriever: R Interface to the Data Retriever - Published in JOSS (2021)
dwctaxon, an R package for editing and validating taxonomic data in Darwin Core format
dwctaxon, an R package for editing and validating taxonomic data in Darwin Core format - Published in JOSS (2024)
wdpar
wdpar: Interface to the World Database on Protected Areas - Published in JOSS (2022)
rcites
rcites: An R package to access the CITES Speciesplus database - Published in JOSS (2018)
Raphtory
Raphtory: The temporal graph engine for Rust and Python - Published in JOSS (2024)
usearch
Fast Open-Source Search & Clustering engine × for Vectors & Arbitrary Objects × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍
bikedata
bikedata - Published in JOSS (2017)
SymbiotaR2
SymbiotaR2: An R Package for Accessing Symbiota2 Data - Published in JOSS (2020)
Caesar
Robust robotic localization and mapping, together with NavAbility(TM). Reach out to info@wherewhen.ai for help.
cazy-webscraper
Web scraper to retrieve protein data catalogued by the CAZy, UniProt, NCBI, GTDB and PDB websites/databases.
chip-atlas
ChIP-Atlas: Browse and analyze all public ChIP/DNase-seq data on your browser
protease_activity_analysis
Python toolkit and package for analyzing enzyme activity data
efp-seq_browser
An RNA-Seq data exploration tool that shows read map coverage of a gene of interest along with a coloured "electronic fluorescent pictographic" (eFP) based on its RPKM expression level.
pandas-ai
Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.
https://github.com/catalyst-cooperative/ferc-xbrl-extractor
A tool for converting FERC filings published in XBRL into SQLite databases
enhancing_reaxff_dft_database
Database used for retraining the ReaxFF force field for the inorganic compound LiF.
ustore
Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️
com.arcadedb:arcadedb-console
ArcadeDB Multi-Model Database, one DBMS that supports SQL, Cypher, Gremlin, HTTP/JSON, MongoDB and Redis. ArcadeDB is a conceptual fork of OrientDB, the first Multi-Model DBMS. ArcadeDB supports Vector Embeddings.
loris
LORIS is a web-accessible database solution for longitudinal multi-site studies.
epiphyte
Python toolkit for working with high-dimensional neural data recorded during naturalistic, continuous stimuli @a-darcher @rachrapp
https://github.com/aiidateam/aiida-core
The official repository for the AiiDA code
bety
Web-interface to the Biofuel Ecophysiological Traits and Yields Database (used by PEcAn and TERRA REF)
scperturb
scPerturb: A resource and a python/R tool for single-cell perturbation data
codechecker
CodeChecker is an analyzer tooling, defect database and viewer extension for static and dynamic analyzer tools.
psycopg
New generation PostgreSQL database adapter for the Python programming language
persian-news-crawler
Simple Script To Crawl Data From Persian News Agencies Including Fars, Mehr.
pylcaio
A Python class to hybridize lifecycle assessment (LCA) and environmentally extended input-output (EEIO) databases.
alchemy
Archived - High performance, realtime BaaS with RBAC, graphing, full text search, S3 / B2 Storage and GIS with a GraphQL, gRPC and REST API
gel
Gel supercharges Postgres with a modern data model, graph queries, Auth & AI solutions, and much more.
https://github.com/aiidateam/aiida-workgraph
Efficiently design and manage flexible workflows with AiiDA, featuring an interactive GUI, checkpoints, provenance tracking, and remote execution capabilities.
https://github.com/beuth-erdelt/benchmark-experiment-host-manager
This python tool helps managing DBMS benchmarking experiments in a Kubernetes-based HPC cluster environment. It enables users to configure hardware / software setups for easily repeating tests over varying configurations.
cytominer-database
[DEPRECATED] A package for storing morphological profiling data.
https://github.com/introlab/openimu
Open Source Analytics & Visualisation Software for Inertial Measurement Units
sqlcmdcli
sqlcmdcli is written in Delphi RAD Studio and lets you connect to a SQL Server instance and run specific commands!
relational-databases
Teaching Materials for the "Relational Database Course" taught at the HFTM
https://github.com/superduper-io/superduper
Superduper: End-to-end framework for building custom AI applications and agents.
reptate
RepTate (Rheology of Entangled Polymers: Toolkit for Analysis of Theory & Experiment)
neotoma_lakes
A repository for managing the matching of lake data between national hydrographic databases and Neotoma records.
https://github.com/synnaxlabs/synnax
The data and operations foundation for hardware.
tubedb
TubeDB is an on-demand processing database system for climate station data
https://github.com/epsilla-cloud/vectordb
Epsilla is a high performance Vector Database Management System
covid19-italy-integrated-surveillance-data
COVID-19 integrated surveillance data provided by the Italian Institute of Health and processed via UnrollingAverages.jl to deconvolve the weekly moving averages.
https://github.com/bluebrain/bbp-data-push
CLIs that take in input Atlas Pipeline datasets and push them into Nexus along with a Resource properties payload
https://github.com/bluebrain/bbp-atlas-pipeline-validator
Validation of Atlas pipeline configuration
https://github.com/datafold/data-diff
Compare tables within or across databases
https://github.com/aspuru-guzik-group/molar
Molar is a database management to make it easy to store experiment whether computational or not
https://github.com/bluebrain/bluenaas-subcellular
A web environment for the simulation of brain molecular networks.
https://github.com/dyka3773/db-drift
A command-line tool to visualize the differences between two DB states.
https://github.com/catalyst-cooperative/pudl-catalog
An Intake catalog for distributing open energy system data liberated by Catalyst Cooperative.
https://github.com/datacleaner/datacleaner
The premier open source Data Quality solution