Scientific Software
Updated 6 months ago

OnlineStats.jl — Peer-reviewed • Rank 16.8 • Science 95%

OnlineStats.jl: A Julia package for statistics on data streams - Published in JOSS (2020)

Scientific Software
Updated 6 months ago

Openseize — Peer-reviewed • Rank 7.5 • Science 98%

Openseize: A digital signal processing package for large EEG datasets in Python - Published in JOSS (2023)

Mathematics
Scientific Software · Peer-reviewed
Updated 5 months ago

ch.cern.spark:spark-avro_2.12 • Rank 40.2 • Science 36%

Apache Spark - A unified analytics engine for large-scale data processing

Updated 6 months ago

pretzel • Rank 8.3 • Science 67%

Javascript full-stack framework for Big Data visualisation and analysis

Updated 5 months ago

cython • Rank 36.7 • Science 36%

The most widely used Python to C compiler

Updated 6 months ago

reductstore • Rank 17.6 • Science 52%

High Performance Storage and Streaming Solution for Data Acquisition Systems

Updated 6 months ago

ustore • Rank 15.3 • Science 54%

Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️

Updated 5 months ago

bigstatsr • Rank 17.8 • Science 49%

R package for statistical tools with big matrices stored on disk.

Updated 5 months ago

h2o • Rank 27.7 • Science 36%

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Updated 6 months ago

graphscope • Rank 21.3 • Science 36%

🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统

Scientific Software
Updated 6 months ago

Critical care data processing tools — Peer-reviewed • Rank 6.3 • Science 46%

Critical care data processing tools - Published in JOSS (2017)

Updated 6 months ago

opentimes • Rank 7.6 • Science 44%

Free travel times between U.S. Census geographies

Updated 5 months ago

eland • Rank 20.0 • Science 26%

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

Updated 5 months ago

https://github.com/cbg-ethz/pybda • Rank 7.5 • Science 36%

:computer::computer::computer: A commandline tool for analysis of big biological data sets for distributed HPC clusters.

Updated 5 months ago

workbench • Rank 15.3 • Science 26%

Workbench: An easy to use Python API for creating and deploying AWS SageMaker Models

Updated 5 months ago

https://github.com/alleninstitute/vis • Rank 5.0 • Science 36%

Typescript packages for building big-data visualization tools & components, with examples for a variety of common data types & formats

Updated 5 months ago

sgd • Rank 16.8 • Science 23%

An R package for large scale estimation with stochastic gradient descent

Updated 5 months ago

https://github.com/bsc-wdc/dislib • Rank 12.6 • Science 26%

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Updated 5 months ago

https://github.com/big-data-lab-umbc/cybertraining • Rank 6.0 • Science 10%

Multidisciplinary Research and Education on Big Data + High-Performance Computing + Atmospheric Sciences at UMBC

Updated 5 months ago

https://github.com/aveek-saha/cricket-score-predictor • Rank 1.4 • Science 13%

A Big data application to predict the outcome of a T20 cricket match.

Updated 6 months ago

ai-commercial-decisionmaking • Science 54%

AI-Driven Large Dataset Analysis & Commercial Decision-Making: Research on predictive analytics, machine learning strategies, and real-world business applications [Python, TensorFlow, PyTorch] 🤖📊

Updated 6 months ago

sigmf • Science 44%

The Signal Metadata Format Specification

Updated 5 months ago

https://github.com/amalan-constat/needs4bigdata • Science 49%

R package implementing subsampling methods to find informative samples from big data

Updated 6 months ago

sgc • Science 67%

Code and data used for the SGC

Updated 5 months ago

https://github.com/awslabs/amazon-s3-find-and-forget • Science 26%

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

Scientific Software
Updated 6 months ago

AquaFetch — Peer-reviewed • Science 93%

AquaFetch: A Unified Python Interface for Water Resource Dataset Acquisition and Harmonization - Published in JOSS (2025)

Economics
Scientific Software · Peer-reviewed
Updated 5 months ago

https://github.com/talariadb/talaria • Science 13%

TalariaDB is a distributed, highly available, and low latency time-series database for Presto

Updated 6 months ago

big-qa-architecture • Science 67%

BigQA Architecture: Big Data Architecture for Question Answering Systems

Updated 4 months ago

https://github.com/erictleung/sports-popularity-in-usa • Science 13%

:basketball: Analysis of sports popularity in the USA

Updated 5 months ago

deepicedrain • Science 23%

Mapping and monitoring deep subglacial water activity in Antarctica using remote sensing and machine learning, with ICESat-2!

Updated 6 months ago

RangeExtractor • Science 31%

A performant way to extract subsections of arrays, under a tiling scheme. Meant for arrays with slow I/O.

Updated 6 months ago

marex • Science 49%

Marine Extremes detection, identification, and tracking/merging for Exascale Climate data

Scientific Software
Updated 6 months ago

remotePARTS — Peer-reviewed • Science 93%

remotePARTS: Spatiotemporal autoregression analyses for large data sets - Published in JOSS (2025)

Earth and Environmental Sciences (40%) Economics (40%)
Scientific Software · Peer-reviewed
Updated 6 months ago

dockerunifieduimainterface • Science 67%

A UIMA-based tool for the scaled, uniform, distributed, platform-independent and easily reusable use of Natural Language Processing (NLP) methods using Docker.