LiberTEM
LiberTEM: Software platform for scalable multidimensional data processing in transmission electron microscopy - Published in JOSS (2020)
VIP
VIP: A Python package for high-contrast imaging - Published in JOSS (2023)
latentcor
latentcor: An R Package for estimating latent correlations from mixed data types - Published in JOSS (2021)
heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
collapse
Advanced and Fast Data Transformation in R
dantro
dantro is a Python package to handle, transform, and visualize hierarchically structured data. Docs @ https://dantro.readthedocs.io — NOTE: This repository is a READ-ONLY-MIRROR of the actual development repository; for open issues and MRs, see there:
https://github.com/modelscope/data-juicer
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
https://github.com/councildataproject/cdp-backend
Data storage utilities and processing pipelines used by CDP instances.
forte
Forte is a flexible and powerful ML workflow builder. This is part of the CASL project: http://casl-project.ai/
https://github.com/forieux/qmm
Python Quadratic Majorization-Minimization (MM) optimization algorithms of half-quadratic criteria. Inverses problems, image restoration, denoising, ...
https://github.com/aces/cbrain
CBRAIN is a flexible Ruby on Rails framework for accessing and processing of large data on high-performance computing infrastructures.
https://github.com/johnkerl/miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
https://github.com/asyml/texar-pytorch
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
https://github.com/buchananja/dpyp
A convenience tool for small-scale data pipelines in Python
https://github.com/fgcz/pyfgcz
Move and Convert Mass Spectrometry Data using BioBeamer and FCC
Machine-Learning-for-Solar-Energy-Prediction
Predict the Power Production of a solar panel farm from Weather Measurements using Machine Learning
https://github.com/cloud-span/genomics05-data-processing-analysis
Data Processing & Analysis
https://github.com/graphbookai/graphbook
Visual AI development framework for training and inference of ML models, scaling pipelines, and automating workflows with Python.⭐ Leave a star to support us!
https://github.com/dadananjesha/spark-streaming
Spark Streaming KPI Processing is a real-time data processing application built using Apache Spark Streaming
leef
Core of Data Processing Pipeline for the LEEF Experiment - See https://leef-uzh.github.io/LEEF/ for documentation - WORK IN PROGRESS
counting-ocean-particles
A set of easy codes to process data on marine suspended particles collected with different sensors
https://github.com/cea-metrocarac/spectroview
SPECTROview : A Tool for Spectroscopic Data Processing and Visualization.
universitatespodcastdata
An R package for downloading, extracting, and analyzing interview transcripts from the Universitates podcast series. It provides tools for data processing, searching, and visualization
https://github.com/dmdequin/airbnb_price_predict
Machine Learning and NLP to predict Airbnb prices
https://github.com/OpenDCAI/DataFlow
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
https://github.com/conqxeror/veloxx
Veloxx: A high-performance, lightweight Rust library for in-memory data processing and analytics. Features DataFrames, Series, CSV/JSON I/O, powerful transformations, aggregations, and statistical functions for efficient data science and engineering.