kglab
Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.
https://github.com/catalyst-cooperative/pudl-examples
Example Jupyter notebooks hosted on Kaggle that demonstrate how to work with US energy data from PUDL.
https://github.com/catalyst-cooperative/pudl-catalog
An Intake catalog for distributing open energy system data liberated by Catalyst Cooperative.
https://github.com/bigbio/pgatk-io
High performance io library for proteogenomics
https://github.com/apecloud/myduckserver
Unified MySQL, Postgres & FlightSQL Server, Powered by DuckDB.
legalkit-pipeline
Publication pipeline for French legal codes on 🤗 Datasets from LegiFrance with concurrent upload and dynamic REAMDE.md.
hybridbackend
A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster
https://github.com/rumbledb/rumble
⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
https://github.com/crowdstrike/kafka-replicator
Kafka replicator is a tool used to mirror and backup Kafka topics across regions
https://github.com/awslabs/amazon-s3-find-and-forget
Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)