Scientific Software
Updated 6 months ago

funsies — Peer-reviewed • Rank 7.6 • Science 98%

funsies: A minimalist, distributed and dynamic workflow engine - Published in JOSS (2021)

Medicine (40%)
Scientific Software · Peer-reviewed
Updated 6 months ago

morph-kgc • Rank 8.0 • Science 67%

Powerful RDF Knowledge Graph Generation with RML Mappings

Updated 5 months ago

https://github.com/airbytehq/airbyte • Rank 28.6 • Science 36%

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Updated 5 months ago

uptasticsearch • Rank 18.3 • Science 36%

An Elasticsearch client tailored to data science workflows.

Updated 5 months ago

https://github.com/apache/hamilton • Rank 12.1 • Science 36%

Apache Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.

Updated 5 months ago

workbench • Rank 15.3 • Science 26%

Workbench: An easy to use Python API for creating and deploying AWS SageMaker Models

Updated 5 months ago

https://github.com/larribas/dagger • Rank 8.1 • Science 26%

Define sophisticated data pipelines with Python and run them on different distributed systems (such as Argo Workflows).

Updated 6 months ago

quati • Rank 7.9 • Science 26%

Dynamic data eng. functions to accelerate development and coding

Updated 5 months ago

https://github.com/ploomber/soorgeon • Rank 14.4 • Science 13%

Convert monolithic Jupyter notebooks 📙 into maintainable Ploomber pipelines. 📊

Scientific Software
Updated 6 months ago

Carnival — Peer-reviewed • Science 93%

Carnival: JVM Property graph data unification toolkit - Published in JOSS (2025)

Updated 5 months ago

https://github.com/data-prompt-query/dpq • Science 26%

dpq is an open-source python library that makes prompt-based data transformations and feature engineering easy

Updated 5 months ago

https://github.com/alvarocavalcante/airflow-parse-bench • Science 13%

Stop creating bad DAGs! Use this tool to measure and compare the parse time of your DAGs, identify bottlenecks, and optimize your Airflow environment for better performance.

Updated 5 months ago

https://github.com/cured-plus/csvw-duckdb • Science 13%

Convert a CSVW document (CSV metadata) to a DuckDB query to load a CSV file into a database.

Updated 6 months ago

data2neo • Science 54%

Data2Neo is a library that simplifies the conversion of data in relational format to a graph knowledge database.

Updated 5 months ago

https://github.com/ccao-data/data-architecture • Science 26%

Codebase for CCAO data infrastructure construction and management

Updated 6 months ago

pycottas • Science 49%

Python COTTAS library for compressing and querying RDF

Updated 6 months ago

adequat_project_ai-and-optimization • Science 67%

Evolutionary Cost-Tolerance Optimization for Complex Assembly Mechanisms Via Simulation and Surrogate Modeling Approaches: Application on Micro Gears (http://dx.doi.org/10.21203/rs.3.rs-2487746/v1)

Updated 5 months ago

https://github.com/ibridges-for-irods/ibridges • Science 49%

A wrapper around the python-irodsclient to allow for easy interaction with iRODS servers.

Updated 5 months ago

https://github.com/desbordante/desbordante-core • Science 49%

Desbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.

Updated 6 months ago

ids-drr-assam-risk-model • Science 26%

Intelligent Data Solution - Disaster Risk Reduction is a system to assist flood management in the state of Assam through data-driven ways. The repository contains codes to extract relevant datasets and the modelling approach used to calculate Risk Scores for each revenue circle in Assam.

Artificial Intelligence and Machine Learning (40%)
Updated 6 months ago

signalslite • Science 44%

A small library to efficiently store and process global equity data, especially for Numerai's Signals tournament (WIP)

Updated 5 months ago

https://github.com/simantalahkar/lammpskit • Science 39%

lammpskit is a Python toolkit for post-processing and analyzing molecular dynamics (MD) simulations with LAMMPS. Its modular data processing and analysis functions are broadly applicable to scientific computing, data engineering, and machine learning workflows involving time series or semi-structured data.

Updated 5 months ago

https://github.com/alvarocavalcante/airflow-custom-deferrable-dataflow-operator • Science 13%

Start your Dataflow jobs execution directly from the Triggerer without going to the Worker!

Updated 5 months ago

https://github.com/danielvartan/r-course • Science 13%

🚀 Introductory R Course Developed for the Sustentarea Research and Extension Center