Synthia
Synthia: multidimensional synthetic data generation in Python - Published in JOSS (2021)
boxsers
Python package that provides a full range of functionality to process and analyze vibrational spectra (Raman, SERS, FTIR, etc.).
lrebench
[EMNLP 2022 Findings] Towards Realistic Low-resource Relation Extraction: A Benchmark with Empirical Baseline Study
augraphy
Augmentation pipeline for rendering synthetic paper printing, faxing, scanning and copy machine processes
data-augmentation-review
List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.
https://github.com/albumentations-team/albumentationsx
Next-generation Albumentations: dual-licensed for open-source and commercial use
targetran
Python library for data augmentation in object detection or image classification model training
https://github.com/albumentations-team/albucore
A high-performance image processing library designed to optimize and extend the Albumentations library with specialized functions for advanced image transformations. Perfect for developers working in computer vision who require efficient and scalable image augmentation.
SpecAugment
A Implementation of SpecAugment with Tensorflow & Pytorch, introduced by Google Brain
https://github.com/aryashah2k/nlp-data-augmentation
Implementing 5 Different Approaches To Augmenting Data For Natural Language Processing Tasks.
gandlf
A generalizable application framework for segmentation, regression, and classification using PyTorch
nfr
Neural Fuzzy Repair (NFR) is a data augmentation pipeline, which integrates fuzzy matches (i.e. similar translations) into neural machine translation.
aapi_code
A local application frontend and a backend server based on U-Net and Dectectron2 as a solution to the auto annotation of pathology images (Columbia Data Science Institute Fall 2020 Capstone Project)
https://github.com/agamiko/neural-based-data-augmentation
Improving generalization via style transfer-based data augmentation: Novel regularization method
https://github.com/amazon-science/transformers-data-augmentation
Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper
icsfsurvey
Explore concepts like Self-Correct, Self-Refine, Self-Improve, Self-Contradict, Self-Play, and Self-Knowledge, alongside o1-like reasoning elevation🍓 and hallucination alleviation🍄.
https://github.com/amazon-science/mix-generation
MixGen: A New Multi-Modal Data Augmentation
https://github.com/cgcl-codes/transferattacksurrogates
The official code of IEEE S&P 2024 paper "Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability". We study how to train surrogates model for boosting transfer attack.
https://github.com/aiot-mlsys-lab/deepaa
[ICLR 2022] "Deep AutoAugment" by Yu Zheng, Zhi Zhang, Shen Yan, Mi Zhang
p3forecast
a Personalized Privacy Preserving cloud workload prediction framework based on Federated Generative Adversarial Networks (GANs), which allows cloud providers with Non-IID workload data to collaboratively train workload prediction models as preferred while protecting privacy.
deeptrack
DeepTrack2 is a modular Python library for generating, manipulating, and analyzing image data pipelines for machine learning and experimental imaging.