Scientific Software
Updated 10 months ago

OnlineStats.jl — Peer-reviewed • Rank 16.8 • Science 95%

OnlineStats.jl: A Julia package for statistics on data streams - Published in JOSS (2020)

Scientific Software
Updated 10 months ago

Openseize — Peer-reviewed • Rank 7.5 • Science 98%

Openseize: A digital signal processing package for large EEG datasets in Python - Published in JOSS (2023)

Mathematics
Scientific Software · Peer-reviewed
Updated 10 months ago

ch.cern.spark:spark-avro_2.12 • Rank 40.2 • Science 36%

Apache Spark - A unified analytics engine for large-scale data processing

Updated 10 months ago

pretzel • Rank 8.3 • Science 67%

Javascript full-stack framework for Big Data visualisation and analysis

Updated 10 months ago

cython • Rank 36.7 • Science 36%

The most widely used Python to C compiler

Updated 10 months ago

reductstore • Rank 17.6 • Science 52%

High Performance Storage and Streaming Solution for Data Acquisition Systems

Updated 10 months ago

ustore • Rank 15.3 • Science 54%

Multi-Modal Database replacing MongoDB, Neo4J, and Elastic with 1 faster ACID solution, with NetworkX and Pandas interfaces, and bindings for C 99, C++ 17, Python 3, Java, GoLang 🗄️

Updated 10 months ago

bigstatsr • Rank 17.8 • Science 49%

R package for statistical tools with big matrices stored on disk.

Updated 10 months ago

h2o • Rank 27.7 • Science 36%

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.

Updated 10 months ago

graphscope • Rank 21.3 • Science 36%

🔨 🍇 💻 🚀 GraphScope: A One-Stop Large-Scale Graph Computing System from Alibaba | 一站式图计算系统

Scientific Software
Updated 10 months ago

Critical care data processing tools — Peer-reviewed • Rank 6.3 • Science 46%

Critical care data processing tools - Published in JOSS (2017)

Updated 10 months ago

opentimes • Rank 7.6 • Science 44%

Free travel times between U.S. Census geographies

Updated 10 months ago

eland • Rank 20.0 • Science 26%

Python Client and Toolkit for DataFrames, Big Data, Machine Learning and ETL in Elasticsearch

Updated 10 months ago

https://github.com/cbg-ethz/pybda • Rank 7.5 • Science 36%

:computer::computer::computer: A commandline tool for analysis of big biological data sets for distributed HPC clusters.

Updated 10 months ago

workbench • Rank 15.3 • Science 26%

Workbench: An easy to use Python API for creating and deploying AWS SageMaker Models

Updated 10 months ago

https://github.com/alleninstitute/vis • Rank 5.0 • Science 36%

Typescript packages for building big-data visualization tools & components, with examples for a variety of common data types & formats

Updated 10 months ago

sgd • Rank 16.8 • Science 23%

An R package for large scale estimation with stochastic gradient descent

Updated 10 months ago

https://github.com/bsc-wdc/dislib • Rank 12.6 • Science 26%

The Distributed Computing library for python implemented using PyCOMPSs programming model for HPC.

Updated 10 months ago

subsemble • Rank 10.5 • Science 23%

subsemble R package for ensemble learning on subsets of data

Updated 10 months ago

https://github.com/big-data-lab-umbc/cybertraining • Rank 6.0 • Science 10%

Multidisciplinary Research and Education on Big Data + High-Performance Computing + Atmospheric Sciences at UMBC

Updated 10 months ago

https://github.com/aveek-saha/cricket-score-predictor • Rank 1.4 • Science 13%

A Big data application to predict the outcome of a T20 cricket match.

Updated 10 months ago

dockerunifieduimainterface • Science 67%

A UIMA-based tool for the scaled, uniform, distributed, platform-independent and easily reusable use of Natural Language Processing (NLP) methods using Docker.

Updated 10 months ago

https://github.com/amalan-constat/needs4bigdata • Science 49%

R package implementing subsampling methods to find informative samples from big data

Updated 10 months ago

https://github.com/awslabs/amazon-s3-find-and-forget • Science 26%

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)

Scientific Software
Updated 10 months ago

AquaFetch — Peer-reviewed • Science 93%

AquaFetch: A Unified Python Interface for Water Resource Dataset Acquisition and Harmonization - Published in JOSS (2025)

Economics
Scientific Software · Peer-reviewed
Updated 10 months ago

sigmf • Science 44%

The Signal Metadata Format Specification

Updated 8 months ago

https://github.com/erictleung/sports-popularity-in-usa • Science 13%

:basketball: Analysis of sports popularity in the USA

Updated 10 months ago

path_based_traffic_flow_prediction • Science 26%

Forecast future traffic flow on a road network.

Scientific Software
Updated 10 months ago

remotePARTS — Peer-reviewed • Science 93%

remotePARTS: Spatiotemporal autoregression analyses for large data sets - Published in JOSS (2025)

Earth and Environmental Sciences (40%) Economics (40%)
Scientific Software · Peer-reviewed
Updated 10 months ago

ai-commercial-decisionmaking • Science 54%

AI-Driven Large Dataset Analysis & Commercial Decision-Making: Research on predictive analytics, machine learning strategies, and real-world business applications [Python, TensorFlow, PyTorch] 🤖📊

Updated 10 months ago

big-qa-architecture • Science 67%

BigQA Architecture: Big Data Architecture for Question Answering Systems

Updated 10 months ago

https://github.com/talariadb/talaria • Science 13%

TalariaDB is a distributed, highly available, and low latency time-series database for Presto

Updated 10 months ago

sgc • Science 67%

Code and data used for the SGC

Updated 10 months ago

deepicedrain • Science 23%

Mapping and monitoring deep subglacial water activity in Antarctica using remote sensing and machine learning, with ICESat-2!

Updated 10 months ago

RangeExtractor • Science 31%

A performant way to extract subsections of arrays, under a tiling scheme. Meant for arrays with slow I/O.

Updated 10 months ago

marex • Science 49%

Marine Extremes detection, identification, and tracking/merging for Exascale Climate data