Updated 6 months ago

kglab • Rank 15.8 • Science 67%

Graph Data Science: an abstraction layer in Python for building knowledge graphs, integrated with popular graph libraries – atop Pandas, NetworkX, RAPIDS, RDFlib, pySHACL, PyVis, morph-kgc, pslpython, pyarrow, etc.

Updated 6 months ago

arrow • Rank 40.7 • Science 36%

Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics

Updated 5 months ago

nanoparquet • Science 26%

R package to read and write Parquet files

Updated 6 months ago

legalkit-pipeline • Science 44%

Publication pipeline for French legal codes on 🤗 Datasets from LegiFrance with concurrent upload and dynamic REAMDE.md.

Updated 6 months ago

hybridbackend • Science 54%

A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster

Updated 5 months ago

https://github.com/rumbledb/rumble • Science 36%

⛈️ RumbleDB 2.0.0 "Lemon Ironwood" 🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Updated 5 months ago

https://github.com/crowdstrike/kafka-replicator • Science 13%

Kafka replicator is a tool used to mirror and backup Kafka topics across regions

Updated 5 months ago

https://github.com/awslabs/amazon-s3-find-and-forget • Science 26%

Amazon S3 Find and Forget is a solution to handle data erasure requests from data lakes stored on Amazon S3, for example, pursuant to the European General Data Protection Regulation (GDPR)