Updated 9 months ago

cromwell • Rank 16.6 • Science 62%

Scientific workflow engine designed for simplicity & scalability. Trivially transition between one off use cases to massive scale production environments

Updated 9 months ago

ch.cern.spark:spark-avro_2.12 • Rank 40.2 • Science 36%

Apache Spark - A unified analytics engine for large-scale data processing

Updated 9 months ago

io.joern:c2cpg_2.13 • Rank 17.5 • Science 54%

Open-source code analysis platform for C/C++/Java/Binary/Javascript/Python/Kotlin based on code property graphs. Discord https://discord.gg/vv4MH284Hc

Updated 9 months ago

com.linkedin.isolation-forest • Rank 12.3 • Science 44%

A distributed Spark/Scala implementation of the isolation forest algorithm for unsupervised outlier detection, featuring support for scalable training and ONNX export for easy cross-platform inference.

Updated 9 months ago

mep • Rank 2.2 • Science 54%

Project MEP: Meme Evolution programme. A terraformed multi-language library to do statistical experiments in Twitter.

Updated 9 months ago

https://github.com/awslabs/deequ • Rank 16.5 • Science 36%

Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.

Updated 9 months ago

https://github.com/bluebrain/nexus • Rank 14.1 • Science 36%

Blue Brain Nexus - A knowledge graph for data-driven science

Updated 9 months ago

https://github.com/emergentorder/onnx-scala • Rank 6.3 • Science 26%

An ONNX (Open Neural Network eXchange) API and backend for typeful, functional deep learning and classical machine learning in Scala 3

Updated 8 months ago

https://github.com/lamastex/scadamale • Science 26%

Scalable Data Science and Distributed Machine Learning Course Book written by Raazesh Sainudiin and his WASP AI-Track PhD Students

Updated 9 months ago

py2eo • Science 44%

Experimental Translator of Python Programs to EO Programming Language

Updated 9 months ago

https://github.com/databrickslabs/tempo • Science 26%

API for manipulating time series on top of Apache Spark: lagged time values, rolling statistics (mean, avg, sum, count, etc), AS OF joins, downsampling, and interpolation

Updated 9 months ago

https://github.com/broadinstitute/wdl4s • Science 13%

Scala bindings for WDL

Updated 9 months ago

dotty-cps-async • Science 44%

experimental CPS transformer for dotty

Updated 9 months ago

https://github.com/benkeks/clave • Science 13%

Casual game about trapping monsters.

Updated 9 months ago

https://github.com/dataunitylab/jsonoid-discovery • Science 57%

Distributed JSON schema discovery

Updated 9 months ago

sustainability-analysis-tool • Science 57%

Implementation of an analysis tool for business process sustainability analyses