simstudy
simstudy: Illuminating research methods through data generation - Published in JOSS (2020)
StreamGen
StreamGen: a Python framework for generating streams of labeled data - Published in JOSS (2024)
Synthia
Synthia: multidimensional synthetic data generation in Python - Published in JOSS (2021)
grounded-segment-anything
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
data-augmentation-review
List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.
https://github.com/sdv-dev/deepecho
Synthetic Data Generation for mixed-type, multivariate time series.
edo
A library for generating artificial datasets through genetic evolution.
graphg
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation
https://github.com/sebhaan/tabpfgen
TabPFGen: Synthetic Tabular Data Generation with TabPFN
https://github.com/1x-technologies/wb-humanoid-mpc
Realtime Physics-Based Procedural Loco-Manipulation Planning and Control
spn4cir
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
BenchmarkDataNLP.jl
BenchmarkDataNLP.jl: Synthetic Data Generation for NLP Benchmarking - Published in JOSS (2025)
Syclops
Syclops: A Modular Pipeline for Procedural Generation of Synthetic Data - Published in JOSS (2025)
https://github.com/markusjx/datagen
Random data generator based on JSON schemas