torchmanager

A generic deep learning training/testing framework for PyTorch

https://github.com/kisonho/torchmanager

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    2 of 4 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.7%) to scientific vocabulary

Keywords

deep-learning python pytorch torchmanager
Last synced: 6 months ago · JSON representation ·

Repository

A generic deep learning training/testing framework for PyTorch

Basic Info
  • Host: GitHub
  • Owner: kisonho
  • License: bsd-2-clause
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 2.53 MB
Statistics
  • Stars: 12
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 34
Topics
deep-learning python pytorch torchmanager
Created about 4 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

torchmanager

A generic deep learning training/testing framework for PyTorch

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10381715.svg)](https://doi.org/10.5281/zenodo.10381715)

To use this framework, simply initialize a Manager object. The Manager class provides a generic training/testing loop for PyTorch models. It also provides some useful callbacks to use during training/testing.

Pre-request

  • Python 3.10+
  • PyTorch
  • Packaging
  • tqdm
  • PyYAML (Optional for yaml configs)
  • scipy (Optional for FID metric)
  • tensorboard (Optional for tensorboard recording)

Installation

  • PyPi: pip install torchmanager
  • Conda: conda install torchmanager -c conda-forge

Start from Configurations

The Configs class is designed to be inherited to define necessary configurations. It also provides a method to get configurations from terminal arguments.

```python from torchmanager.configs import Configs as _Configs

define necessary configurations

class Configs(_Configs): epochs: int lr: float ...

@staticmethod
def get_arguments(parser: Union[argparse.ArgumentParser, argparse._ArgumentGroup] = argparse.ArgumentParser()) -> Union[argparse.ArgumentParser, argparse._ArgumentGroup]:
    '''Add arguments to argument parser'''
    ...

def show_settings(self) -> None:
    '''Display current configuerations'''
    ...

get configs from terminal arguments

configs = Configs.from_arguments() ```

Torchmanager Dataset

The data.Dataset class is designed to be inherited to define a dataset. It is a combination of torch.utils.data.Dataset and torch.utils.data.DataLoader with easier usage.

```python from torchmanager.data import Dataset

define dataset

class CustomDataset(Dataset): def init(self, ...): ...

@property
def unbatched_len(self) -> int:
    '''The total length of data without batch'''
    ...

def __getitem__(self, index: int) -> tuple[torch.Tensor, torch.Tensor]:
    '''Returns a single pair of unbatched data, iterator will batch the data automatically with `torch.util.data.DataLoader`'''
    ...

initialize datasets

trainingdataset = CustomDataset(...) valdataset = CustomDataset(...) testing_dataset = CustomDataset(...) ```

The Manager

The Manager class is the core of the framework. It provides a generic training/testing pipeline for PyTorch models. The Manager class is designed to be inherited to manage the training/testing algorithm. There are also some useful callbacks to use during training/testing.

  1. Initialize the manager with target model, optimizer, loss function, and metrics: ```python import torch, torchmanager

define model

class PytorchModel(torch.nn.Module): ...

initialize model, optimizer, loss function, and metrics

model = PytorchModel(...) optimizer = torch.optim.SGD(model.parameters(), lr=configs.lr) loss_fn = torchmanager.losses.CrossEntropy() metrics = {'accuracy': torchmanager.metrics.SparseCategoricalAccuracy()}

initialize manager

manager = torchmanager.Manager(model, optimizer, lossfn=lossfn, metrics=metrics) ```

  • Multiple losses can be used by passing a dictionary to loss_fn: python loss_fn = { 'loss1': torchmanager.losses.CrossEntropy(), 'loss2': torchmanager.losses.Dice(), ... } # total_loss = loss1 + loss2

  • Use weight for constant weight coefficients to control the balance between multiple losses: ```python

    define weights

    w1: float = ... w2: float = ...

lossfn = { 'loss1': torchmanager.losses.CrossEntropy(weight=w1), 'loss2': torchmanager.losses.Dice(), ... } # totalloss = w1 * loss1 + w2 * loss2 ```

  • Use target for output targets between different losses: ```python class ModelOutputDict(TypedDict): output1: torch.Tensor output2: torch.Tensor

LabelDict = ModelOutputDict # optional, label can also be a direct torch.Tensor to compare with target

lossfn = { 'loss1': torchmanager.losses.CrossEntropy(target="output1"), 'loss2': torchmanager.losses.Dice(target="output2"), ... } # totalloss = loss1(y['output1'], label['output1']) + loss2(y['output2'], label['output2]) if type(label) is LabelDict else loss1(y['output1'], label) + loss2(y['output2'], label) ```

  1. Train the model with fit method: python show_verbose: bool = ... # show progress bar information during training/testing manager.fit(training_dataset, epochs=configs.epochs, val_dataset=val_dataset, show_verbose=show_verbose)
  • There are also some other callbacks to use: python tensorboard_callback = torchmanager.callbacks.TensorBoard('logs') # tensorboard dependency required last_ckpt_callback = torchmanager.callbacks.LastCheckpoint(manager, 'last.model') model = manager.fit(..., callbacks_list=[tensorboard_callback, last_ckpt_callback])
  1. Test the model with test method: python manager.test(testing_dataset, show_verbose=show_verbose)

  2. Save the final trained PyTorch model: python torch.save(model, "model.pth") # The saved PyTorch model can be loaded individually without using torchmanager

Device selection during training/testing

Torchmanager automatically identifies available devices for training and testing. If CUDA or MPS is available, it will be used first. To use multiple GPUs, set the use_multi_gpus flag to True. To specify a different device for training or testing, pass the device to the fit or test method, respectively. When use_multi_gpus is set to False, the first available or specified device will be used.

  1. Multi-GPU (CUDA) training/testing: ```python # train on multiple GPUs model = manager.fit(..., usemultigpus=True)

test on multiple GPUs

manager.test(..., usemultigpus=True) ```

  1. Use only specified GPUs for training/testing: ```python # specify devices to use gpus: list[torch.device] | torch.device = ... # Notice: device id must be specified

train on specified multiple GPUs

model = manager.fit(..., usemultigpus=True, devices=gpus) # Notice: use_multi_gpus must set to True to use all specified GPUs, otherwise only the first will be used.

test on specified multiple GPUs

manager.test(..., usemultigpus=True, devices=gpus) ```

Customize training/testing algorithm

Inherited the Manager (TrainingManager) class to manage the training/testing algorithm if default training/testing algorithm is necessary. To customize the training/testing algorithm, simply override the train_step and/or test_step methods. ```python class CustomManager(Manager): ...

def train_step(x_train: Any, y_train: Any) -> dict[str, float]:
    ...  # code before default training step
    summary = super().train_step(x_train, y_train)
    ...  # code after default training step
    return summary

def test_step(x_test: Any, y_test: Any) -> dict[str, float]:
    ...  # code before default testing step
    summary = super().test_step(x_test, y_test)
    ...  # code after default testing step
    return summary

```

Inherited the TestingManager class to manage the testing algorithm without training algorithm if default testing algorithm is necessary. To customize the testing algorithm, simply override the test_step methods. ```python class CustomManager(TestingManager): ...

def test_step(x_test: Any, y_test: Any) -> dict[str, float]:
    ...  # code before default testing step
    summary = super().test_step(x_test, y_test)
    ...  # code after default testing step
    return summary

```

Inherited the BasicTrainingManager class to implement the training algorithm with train_step method and testing algorithm with test_step. ```python class CustomManager(BasicTrainingManager): ...

def train_step(x_train: Any, y_train: Any) -> dict[str, float]:
    ...  # code for one iteration training
    summary: dict[str, float] = ...  # set training summary
    return summary

def test_step(x_test: Any, y_test: Any) -> dict[str, float]:
    ...  # code for one iteration testing
    summary = ...  # set testing summary
    return summary

```

Inherited the BasicTestingManager class to implement the testing algorithm with test_step method without training algorithm. ```python class CustomManager(BasicTestingManager): ...

def test_step(x_test: Any, y_test: Any) -> dict[str, float]:
    ...  # code for one iteration testing
    summary = ...  # set testing summary
    return summary

```

The saved experiment information

The Experiment class is designed to be used as a single callback to save experiment information. It is a combination of torchmanager.callbacks.TensorBoard, torchmanager.callbacks.LastCheckpoint, and torchmanager.callbacks.BestCheckpoint with easier usage. ```python ...

expcallback = torchmanager.callbacks.Experiment('test.exp', manager) # tensorboard dependency required model = manager.fit(..., callbackslist=[exp_callback]) ```

The information, including full training logs and checkpoints, will be saved in the following structure: experiments └── <experiment name>.exp ├── checkpoints │ ├── best-<metric name>.model │ └── last.model └── data │ └── <TensorBoard data file> ├── <experiment name>.cfg └── <experiment name>.log

Please cite this work if you find it useful

bibtex @software{he_2023_10381715, author = {He, Qisheng and Dong, Ming}, title = {{TorchManager: A generic deep learning training/testing framework for PyTorch}}, month = dec, year = 2023, publisher = {Zenodo}, version = 1, doi = {10.5281/zenodo.10381715}, url = {https://doi.org/10.5281/zenodo.10381715} }

Also checkout our projects implemented with torchmanager

  • A-Bridge (SDE-BBDM) - Score-Based Image-to-Image Brownian Bridge
  • MAG-MS/MAGNET - Modality-Agnostic Learning for Medical Image Segmentation Using Multi-modality Self-distillation
  • tlt - Transferring Lottery Tickets in Computer Vision Models: a Dynamic Pruning Approach

Owner

  • Name: Qisheng Robert He
  • Login: kisonho
  • Kind: user
  • Location: Detroit
  • Company: Wayne State University

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this framework, please cite it as below."
authors:
- family-names: "He"
  given-names: "Qisheng"
- family-names: "Dong"
  given-names: "Ming"
title: "TorchManager: A generic deep learning training/testing framework for PyTorch"
version: 1
doi: 10.5281/zenodo.10381715
date-released: 2022-02-22
url: "https://doi.org/10.5281/zenodo.10381715"

GitHub Events

Total
  • Release event: 12
  • Watch event: 2
  • Delete event: 4
  • Push event: 111
  • Create event: 16
Last Year
  • Release event: 12
  • Watch event: 2
  • Delete event: 4
  • Push event: 111
  • Create event: 16

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 694
  • Total Committers: 4
  • Avg Commits per committer: 173.5
  • Development Distribution Score (DDS): 0.441
Past Year
  • Commits: 159
  • Committers: 2
  • Avg Commits per committer: 79.5
  • Development Distribution Score (DDS): 0.025
Top Committers
Name Email Commits
Qisheng He R****o@w****u 388
Qisheng He Q****e@w****u 267
Qisheng He Q****e@o****m 38
Kison Ho Q****o@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 6
  • Total pull requests: 2
  • Average time to close issues: 3 months
  • Average time to close pull requests: 1 minute
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • kisonho (2)
Pull Request Authors
  • kisonho (4)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 114 last-month
  • Total dependent packages: 1
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 76
  • Total maintainers: 1
pypi.org: torchmanager

PyTorch Training Manager v1.4.1

  • Versions: 49
  • Dependent Packages: 1
  • Dependent Repositories: 1
  • Downloads: 66 Last month
Rankings
Dependent packages count: 4.8%
Downloads: 14.9%
Average: 18.5%
Stargazers count: 21.6%
Dependent repos count: 21.6%
Forks count: 29.8%
Maintainers (1)
Last synced: 6 months ago
pypi.org: torchmanager-nightly

PyTorch Training Manager v1.3 (Alpha 1)

  • Versions: 27
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 48 Last month
Rankings
Dependent packages count: 6.6%
Downloads: 12.6%
Average: 20.7%
Stargazers count: 23.3%
Forks count: 30.5%
Dependent repos count: 30.6%
Maintainers (1)
Last synced: 11 months ago

Dependencies

setup.py pypi
  • torch >=1.8.2
  • tqdm *
torchmanager.egg-info/requires.txt pypi
  • torch >=1.8.2
  • tqdm *