ffcv-pl
[FFCV-PL] manage fast data loading with ffcv and pytorch lightning
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary
Keywords
Repository
[FFCV-PL] manage fast data loading with ffcv and pytorch lightning
Basic Info
Statistics
- Stars: 15
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
readme.md
FFCV Dataloader with Pytorch Lightning
FFCV is a fast dataloader for neural networks training: https://github.com/libffcv/ffcv
In this repository, all the steps to install and configure it with pytorch-lightning are presented.
The idea is to provide very generic methods and utils, while letting the user decide and configure anything.
Installation
Tested with:
Ubuntu 22.04.2 LTS
python 3.11
ffcv==1.0.2
pytorch==2.0.1
pytorch-lightning==2.0.4
Dependencies
You can install dependencies (FFCV, Pytorch) with the provided environment.yml file:
conda env create --file environment.yml
conda activate ffcv-pl
This should correctly create a conda environment named ffcv-pl.
Note: Modify the pytorch-cuda version to the one compatible with your system.
Note: Solving environment can take quite a long time. I suggest to use libmamba solver to speed up the process.
If the above does not work, then another option is manual installation:
create conda environment
conda create --name ffcv-pl conda activate ffcv-plinstall pytorch according to official website
```
in my environment the command is the following
conda install pytorch torchvision torchaudio pytorch-cuda=[your-version] -c pytorch -c nvidia ```
install ffcv dependencies and pytorch-lightning ```
can take some time for solving, but should not create conflicts
conda install cupy pkg-config libjpeg-turbo">=2.1.4" opencv numba pytorch-lightning">=2.0.0" -c pytorch -c conda-forge ```
install ffcv
pip install ffcv
For further help, check out FFCV installation guidelines: ffcv official page
Package
Once dependencies are installed, it is safe to install the package:
pip install ffcv_pl
Dataset Creation
You need to save your dataset in ffcv format (.beton).
Official FFCV docs.
This package provides you the create_beton_wrapper method, which allows to easily create
a .beton dataset from a torch dataset.
Example from the dataset_creation.py script:
``` from ffcv.fields import RGBImageField
from ffcvpl.generatedataset import createbetonwrapper from torch.utils.data.dataset import Dataset import numpy as np from PIL import Image
class ToyImageLabelDataset(Dataset):
def __init__(self, n_samples: int):
self.samples = [Image.fromarray((np.random.rand(32, 32, 3) * 255).astype('uint8')).convert('RGB')
for _ in range(n_samples)]
def __len__(self):
return len(self.samples)
def __getitem__(self, idx):
return (self.samples[idx], int(idx))
def main():
# 1. Instantiate the torch dataset that you want to create
# Important: the __get_item__ dataset must return tuples! (This depends on FFCV library)
image_label_dataset = ToyImageLabelDataset(n_samples=256)
# 2. Optional: create Field objects.
# here overwrites only RGBImageField, leave default IntField.
fields = (RGBImageField(write_mode='jpg', max_resolution=32), None)
# 3. call the method, and it will automatically create the .beton dataset for you.
create_beton_wrapper(image_label_dataset, "./data/image_label.beton", fields)
if name == 'main':
main()
```
Dataloader and Datamodule
Merge the PL Datamodule with the FFCV Loader object.
Official FFCV Loader docs.
Official Pytorch-Lightning DataModule docs.
In main.py a complete example on how to use the FFCVDataModule method and train a
Lightning Model is given.
The main steps to follow are:
1. create FFCVPipelineManager object, which needs the path to a previously created .beton file,
a list of operations to perform on each item returned by your dataset and an ordering option for Loading.
2. create the FFCVDataModule object, which is a Lightning Module with FFCV Loader.
3. Pass the data module to Pytorch Lightning trainer, and run!
Suggestion : read FFCV performance guide to better understand which options fit your needs.
Complete Example from the main.py script:
``` import pytorchlightning as pl import torch from ffcv.fields.basics import IntDecoder from ffcv.fields.rgbimage import RandomResizedCropRGBImageDecoder, CenterCropRGBImageDecoder from ffcv.loader import OrderOption from ffcv.transforms import ToTensor, ToTorchImage from pytorch_lightning.strategies.ddp import DDPStrategy
from torch import nn from torch.optim import Adam from torchvision.transforms import RandomHorizontalFlip
from ffcvpl.dataloading import FFCVDataModule from ffcvpl.ffcvutils.augmentations import DivideImage255
from ffcvpl.ffcvutils.utils import FFCVPipelineManager
define the LightningModule
class LitAutoEncoder(pl.LightningModule):
def __init__(self):
super().__init__()
self.encoder = nn.Sequential(nn.Linear(32 * 32 * 3, 64), nn.ReLU(), nn.Linear(64, 3))
self.decoder = nn.Sequential(nn.Linear(3, 64), nn.ReLU(), nn.Linear(64, 32 * 32 * 3))
def training_step(self, batch, batch_idx):
x = batch[0]
b, c, h, w = x.shape
x = x.reshape(b, -1)
z = self.encoder(x)
x_hat = self.decoder(z)
loss = nn.functional.mse_loss(x_hat, x)
# Logging to TensorBoard by default
self.log("train_loss", loss)
return loss
def validation_step(self, batch, batch_idx):
pass
def configure_optimizers(self):
optimizer = Adam(self.parameters(), lr=1e-3)
return optimizer
def main():
seed = 1234
pl.seed_everything(seed, workers=True)
batch_size = 16
gpus = 2
nodes = 1
workers = 8
# image label dataset
train_manager = FFCVPipelineManager("./data/image_label.beton", # previously defined using dataset_creation.py
pipeline_transforms=[
# image pipeline
[RandomResizedCropRGBImageDecoder((32, 32)),
ToTensor(),
ToTorchImage(),
DivideImage255(dtype=torch.float32),
RandomHorizontalFlip(p=0.5)],
# label (int) pipeline
[IntDecoder(),
ToTensor()
]
],
ordering=OrderOption.RANDOM) # random ordering for training
val_manager = FFCVPipelineManager("./data/image_label.beton",
pipeline_transforms=[
# image pipeline (different from train)
[CenterCropRGBImageDecoder((32, 32), ratio=1.),
ToTensor(),
ToTorchImage(),
DivideImage255(dtype=torch.float32)],
# label (int) pipeline
None # if None, uses default
],
ordering=OrderOption.SEQUENTIAL) # sequential ordering for validation
# datamodule creation
# ignore test and predict steps, since managers are not defined.
data_module = FFCVDataModule(batch_size, workers, train_manager=train_manager, val_manager=val_manager,
is_dist=True, seed=seed)
# define model
model = LitAutoEncoder()
# trainer
trainer = pl.Trainer(strategy=DDPStrategy(find_unused_parameters=False), deterministic=True,
accelerator='gpu', devices=gpus, num_nodes=nodes, max_epochs=5, logger=False)
# start training!
trainer.fit(model, data_module)
if name == 'main':
main()
```
Code Citations
Pytorch-Lightning:
@software{Falcon_PyTorch_Lightning_2019, author = {Falcon, William and {The PyTorch Lightning team}}, doi = {10.5281/zenodo.3828935}, license = {Apache-2.0}, month = mar, title = {{PyTorch Lightning}}, url = {https://github.com/Lightning-AI/lightning}, version = {1.4}, year = {2019} }FFCV:
@misc{leclerc2022ffcv, author = {Guillaume Leclerc and Andrew Ilyas and Logan Engstrom and Sung Min Park and Hadi Salman and Aleksander Madry}, title = {{FFCV}: Accelerating Training by Removing Data Bottlenecks}, year = {2022}, howpublished = {\url{https://github.com/libffcv/ffcv/}}, note = {commit 2544abdcc9ce77db12fecfcf9135496c648a7cd5} }
Owner
- Name: Dario Serez
- Login: SerezD
- Kind: user
- Location: Genoa, Italy
- Company: Italian Institute of Technology (IIT)
- Repositories: 3
- Profile: https://github.com/SerezD
Ph.D. student at "Istituto Italiano di Tecnologia" - PAVIS research line, Genoa, Italy
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Serez" given-names: "Dario" title: "FFCV Pytorch Lightning" version: 0.2.0 date-released: 2023-05-18 url: "https://github.com/SerezD/ffcv_pytorch_lightning"
GitHub Events
Total
- Watch event: 4
Last Year
- Watch event: 4
Committers
Last synced: about 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| dserez | s****7@g****m | 27 |
| Dario Serez | 6****D | 1 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: 4 minutes
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- SerezD (3)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 29 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 13
- Total maintainers: 1
pypi.org: ffcv-pl
manage fast data loading with ffcv and pytorch lightning
- Homepage: https://github.com/SerezD/ffcv_pytorch_lightning
- Documentation: https://ffcv-pl.readthedocs.io/
- License: MIT
-
Latest release: 0.3.2
published over 2 years ago
Rankings
Maintainers (1)
Dependencies
- cupy
- libjpeg-turbo >=2.1.4
- numba
- opencv
- pip
- pkg-config
- pytorch >=2.0.0
- pytorch-cuda 11.8.*
- pytorch-lightning >=2.0.0
- torchaudio >=2.0.1
- torchvision >=0.15.1
- PyTurboJPEG *
- cupy-cuda11x *
- ffcv >=1.0.0
- numba *
- opencv-python *
- pkgconfig *
- pytorch-lightning >=2.0.0
- torch *
- torchaudio *
- torchvision *