pyraug

Data Augmentation with Variational Autoencoders (TPAMI)

https://github.com/clementchadebec/pyraug

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.7%) to scientific vocabulary

Keywords

data-augmentation python variational-autoencoder

Last synced: 6 months ago · JSON representation ·

Repository

Data Augmentation with Variational Autoencoders (TPAMI)

Basic Info

Host: GitHub
Owner: clementchadebec
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 10.3 MB

Statistics

Stars: 140
Watchers: 3
Forks: 14
Open Issues: 2
Releases: 6

Topics

data-augmentation python variational-autoencoder

Created over 4 years ago · Last pushed over 3 years ago

Metadata Files

Readme License Citation

Pyraug

This library provides a way to perform Data Augmentation using Variational Autoencoders in a reliable way even in challenging contexts such as high dimensional and low sample size data.

Installation

To install the library from pypi.org run the following using pip

bash $ pip install pyraug

or alternatively you can clone the github repo to access to tests, tutorials and scripts. bash $ git clone https://github.com/clementchadebec/pyraug.git and install the library bash $ cd pyraug $ pip install .

Augmenting your Data

In Pyraug, a typical augmentation process is divided into 2 distinct parts:

Train a model using the Pyraug's TrainingPipeline or using the provided scripts/training.py script
Generate new data from a trained model using Pyraug's GenerationPipeline or using the provided scripts/generation.py script

There exist two ways to augment your data pretty straightforwardly using Pyraug's built-in functions.

Using Pyraug's Pipelines

Pyraug provides two pipelines that may be used to either train a model on your own data or generate new data with a pretrained model.

note: These pipelines are independent of the choice of the model and sampler. Hence, they can be used even if you want to access to more advanced features such as defining your own autoencoding architecture.

Launching a model training

To launch a model training, you only need to call a TrainingPipeline instance. In its most basic version the TrainingPipeline can be built without any arguments. This will by default train a RHVAE model with default autoencoding architecture and parameters.

```python

from pyraug.pipelines import TrainingPipeline pipeline = TrainingPipeline() pipeline(traindata=datasetto_augment) ```

where dataset_to_augment is either a numpy.ndarray, torch.Tensor or a path to a folder where each file is a data (handled data formats are .pt, .nii, .nii.gz, .bmp, .jpg, .jpeg, .png).

More generally, you can instantiate your own model and train it with the TrainingPipeline. For instance, if you want to instantiate a basic RHVAE run:

```python

from pyraug.models import RHVAE from pyraug.models.rhvae import RHVAEConfig modelconfig = RHVAEConfig( ... inputdim=int(intputdim) ... ) # inputdim is the shape of a flatten input data ... # needed if you did not provide your own architectures model = RHVAE(model_config) ```

In case you instantiate yourself a model as shown above and you did not provide all the network architectures (encoder, decoder & metric if applicable), the ModelConfig instance will expect you to provide the input dimension of your data which equals to n_channels x height x width x .... Pyraug's VAE models' networks indeed default to Multi Layer Perceptron neural networks which automatically adapt to the input data shape.

note: In case you have different size of data, Pyraug will reshape it to the minimum size min_n_channels x min_height x min_width x ...

Then the TrainingPipeline can be launched by running:

```python

from pyraug.pipelines import TrainingPipeline pipe = TrainingPipeline(model=model) pipe(traindata=datasetto_augment) ```

At the end of training, the model weights models.pt and model config model_config.json file will be saved in a folder outputs/my_model/training_YYYY-MM-DD_hh-mm-ss/final_model.

Important: For high dimensional data we advice you to provide you own network architectures and potentially adapt the training and model parameters see documentation for more details.

Launching data generation

To launch the data generation process from a trained model, run the following.

```python

from pyraug.pipelines import GenerationPipeline from pyraug.models import RHVAE model = RHVAE.loadfromfolder('path/to/your/trained/model') # reload the model pipe = GenerationPipeline(model=model) # define pipeline pipe(samples_number=10) # This will generate 10 data points ```

The generated data is in .pt files in dummy_output_dir/generation_YYYY-MM-DD_hh-mm-ss. By default, it stores batch data of a maximum of 500 samples.

Retrieve generated data

Generated data can then be loaded pretty easily by running

```python

import torch data = torch.load('path/to/generated_data.pt')

```

Using the provided scripts

Pyraug provides two scripts allowing you to augment your data directly with commandlines.

note: To access to the predefined scripts you should first clone the Pyraug's repository. The following scripts are located in scripts folder. For the time being, only RHVAE model training and generation is handled by the provided scripts. Models will be added as they are implemented in pyraug.models

Launching a model training:

To launch a model training, run

$ python scripts/training.py --path_to_train_data "path/to/your/data/folder"

The data must be located in path/to/your/data/folder where each input data is a file. Handled image types are .pt, .nii, .nii.gz, .bmp, .jpg, .jpeg, .png. Depending on the usage, other types will be progressively added.

At the end of training, the model weights models.pt and model config model_config.json file will be saved in a folder outputs/my_model_from_script/training_YYYY-MM-DD_hh-mm-ss/final_model.

Launching data generation

Then, to launch the data generation process from a trained model, you only need to run

$ python scripts/generation.py --num_samples 10 --path_to_model_folder 'path/to/your/trained/model/folder'

The generated data is stored in several .pt files in outputs/my_generated_data_from_script/generation_YYYY-MM-DD_hh_mm_ss. By default, it stores batch data of 500 samples.

Important: In the simplest configuration, default configurations are used in the scripts. You can easily override as explained in documentation. See tutorials for a more in depth example.

Retrieve generated data

Generated data can then be loaded pretty easily by running

```python

import torch data = torch.load('path/to/generated_data.pt') ```

Getting your hands on the code

To help you to understand the way Pyraug works and how you can augment your data with this library we also provide tutorials that can be found in examples folder:

getting_started.ipynb explains you how to train a model and generate new data using Pyraug's Pipelines
playingwithconfigs.ipynb shows you how to amend the predefined configuration to adapt them to your data
makingyourown_autoencoder.ipynb shows you how to pass your own networks to the models implemented in Pyraug

Dealing with issues

If you are experiencing any issues while running the code or request new features please open an issue on github

Citing

If you use this library please consider citing us:

bibtex @article{chadebec2022data, title={Data augmentation in high dimensional low sample size setting using a geometry-based variational autoencoder}, author={Chadebec, Cl{\'e}ment and Thibeau-Sutre, Elina and Burgos, Ninon and Allassonni{\`e}re, St{\'e}phanie}, journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, year={2022}, publisher={IEEE} }

Credits

Logo: SaulLu

Owner

Login: clementchadebec
Kind: user
Company: INRIA

Website: https://clementchadebec.github.io/
Twitter: CChadebec
Repositories: 7
Profile: https://github.com/clementchadebec

Citation (CITATION.cff)

cff-version: 1.2.0
date-released: 2022-06
message: "If you use this software, please cite it as below."
title: "Data augmentation in high dimensional low sample size setting using a geometry-based variational autoencoder"
url: "https://github.com/clementchadebec/pyraug"
authors:
- family-names: Chadebec
  given-names: Clément
- family-names: Thibeau-Sutre
  given-names: Elina
- family-names: Burgos
  given-names: Ninon
- family-names: Allassonnière
  given-names: Stéphanie
preferred-citation:
  type: article
  authors:
  - family-names: Chadebec
    given-names: Clément
  - family-names: Thibeau-Sutre
    given-names: Elina
  - family-names: Burgos
    given-names: Ninon
  - family-names: Allassonnière
    given-names: Stéphanie
  journal: "IEEE Transactions on Pattern Analysis and Machine Intelligence"
  publisher: IEEE
  title: "Data augmentation in high dimensional low sample size setting using a geometry-based variational autoencoder"
  year: 2022
  url: "https://arxiv.org/abs/2105.00026"
  address: "Online"

GitHub Events

Total

Watch event: 5

Last Year

Watch event: 5

Committers

Last synced: over 1 year ago

All Time

Total Commits: 177
Total Committers: 2
Avg Commits per committer: 88.5
Development Distribution Score (DDS): 0.288

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
clementchadebec	c**c@o**r	126
clementchadebec	4****c	51

Committer Domains (Top 20 + Academic)

orange.fr: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 7
Total pull requests: 2
Average time to close issues: 12 days
Average time to close pull requests: 1 minute
Total issue authors: 7
Total pull request authors: 2
Average comments per issue: 1.0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

mikel-hernandezj (1)
VivienvV (1)
ShenyuanLiang (1)
Virgiliok (1)
Yangqru (1)
sarahsftri (1)
emilythyrum (1)

Pull Request Authors

bottyBotz (1)
clementchadebec (1)

Top Labels

Issue Labels

enhancement (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 111 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 7
Total maintainers: 1

pypi.org: pyraug

Data Augmentation with VAE

Homepage: https://github.com/clementchadebec/pyraug
Documentation: https://pyraug.readthedocs.io/
License: Apache Software License
Latest release: 0.0.6
published over 3 years ago

Versions: 7
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 111 Last month

Rankings

Stargazers count: 6.2%

Forks count: 9.6%

Dependent packages count: 10.0%

Average: 14.9%

Dependent repos count: 21.7%

Downloads: 26.9%

Maintainers (1)

cchadebec

Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi

Sphinx ==4.1.2
sphinx-rtd-theme ==0.5.2
sphinxcontrib-applehelp ==1.0.2
sphinxcontrib-bibtex ==2.3.0
sphinxcontrib-devhelp ==1.0.2
sphinxcontrib-htmlhelp ==2.0.0
sphinxcontrib-jsmath ==1.0.1
sphinxcontrib-qthelp ==1.0.3
sphinxcontrib-serializinghtml ==1.1.5

examples/requirements.txt pypi

matplotlib >=3.3.2
torchvision >=0.9.1

requirements.txt pypi

dataclasses >=0.6
dill >=0.3.3
nibabel >=3.2.1
numpy >=1.22
pillow >=8.3.2
pydantic >=1.8.2
torch >=1.8.1

setup.py pypi

for *
str *

pyraug

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Pyraug

Installation

Augmenting your Data

Using Pyraug's Pipelines

Launching a model training

Launching data generation

Retrieve generated data

Using the provided scripts

Launching a model training:

Launching data generation

Retrieve generated data

Getting your hands on the code

Dealing with issues

Citing

Credits

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: pyraug

Rankings

Maintainers (1)

Dependencies