nimble

Lightweight and Parallel Deep Learning Framework

https://github.com/snuspl/nimble

Science Score: 18.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.1%) to scientific vocabulary

Keywords

deep-learning framework gpu-task-scheduling inference parallel training
Last synced: 4 months ago · JSON representation ·

Repository

Lightweight and Parallel Deep Learning Framework

Basic Info
  • Host: GitHub
  • Owner: snuspl
  • License: other
  • Language: C++
  • Default Branch: main_pytorch_v1.7.1
  • Homepage:
  • Size: 169 MB
Statistics
  • Stars: 256
  • Watchers: 9
  • Forks: 33
  • Open Issues: 18
  • Releases: 0
Topics
deep-learning framework gpu-task-scheduling inference parallel training
Created about 5 years ago · Last pushed about 3 years ago
Metadata Files
Readme Contributing License Code of conduct Citation Codeowners

README.md

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Nimble is a deep learning execution engine that accelerates model inference and training by running GPU tasks (i.e., GPU kernels and memory operations) in parallel with minimal scheduling overhead. Given a PyTorch DL model, Nimble automatically generates a GPU task schedule, which employs an optimal parallelization strategy for the model. The schedule is wrapped in a Nimble object and can be seamlessly applied to PyTorch programs. Nimble improves the speed of inference and training by up to 22.34× and 3.61× compared to PyTorch, respectively. Moreover, Nimble outperforms TensorRT by up to 2.81×.

  • Speedup in Inference (ImageNet models)

Inference performance comparison on an NVIDIA V100 GPU.
  • Speedup in Training (CIFAR-10 models)

| Batch 32 | Batch 64 | Batch 128 | |:---:|:---:|:---:| | | | |

Training performance comparison on an NVIDIA V100 GPU.

Version

This version of Nimble is built on top of PyTorch v1.7.1 with CUDA 11.0. If you want to see the old version of Nimble we used for our experiments in the paper, please checkout to main_pytorch_v1.4.1.

Install Nimble

Please refer to instructions to install Nimble from source.

Use Nimble

Nimble supports both inference and training of neural networks.

Model Inference

```python import torch import torchvision

Instantiate a PyTorch Module and move it to a GPU

model = torchvision.models.resnet50() model = model.cuda() model.eval()

Prepare a dummy input

inputshape = [1, 3, 224, 224] dummyinput = torch.randn(*input_shape).cuda()

Create a Nimble object

nimblemodel = torch.cuda.Nimble(model) nimblemodel.prepare(dummy_input, training=False)

Execute the object

randinput = torch.rand(*inputshape).cuda() output = nimblemodel(randinput) ```

Model Training

```python import torch import torchvision

BATCH = 32

Instantiate a PyTorch Module and move it to a GPU

model = torchvision.models.resnet50(num_classes=10) model = model.cuda() model.train()

Define a loss function and an optimizer

loss_fn = torch.nn.CrossEntropyLoss().cuda() optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

Prepare a dummy input

inputshape = [BATCH, 3, 32, 32] dummyinput = torch.randn(*input_shape).cuda()

Create a Nimble object

nimblemodel = torch.cuda.Nimble(model) nimblemodel.prepare(dummy_input, training=True)

Execute the forward pass

randinput = torch.rand(*inputshape).cuda() output = nimblemodel(randinput)

Compute loss

label = torch.zeros(BATCH, dtype=torch.long).cuda() loss = loss_fn(output, label)

Execute the backward pass

loss.backward()

Perform an optimization step

optimizer.step() ```

Reproduce Evaluation Results

Please refer to evaluation instructions to reproduce the evaluation results.

Publication

Woosuk Kwon, Gyeong-In Yu, Eunji Jeong, and Byung-Gon Chun (* equal contribution), Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning, 34th Conference on Neural Information Processing Systems (NeurIPS), Spotlight, December 2020.

Citation

bibtex @inproceedings{kwon2020nimble, title={Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning}, author={Kwon, Woosuk and Yu, Gyeong-In and Jeong, Eunji and Chun, Byung-Gon}, booktitle={NeurIPS}, year={2020} }

Troubleshooting

Create an issue for questions and bug reports.

Contribution

We welcome your contributions to Nimble! We aim to create an open-source project that is contributed by the open-source community. For general discussions about development, please subscribe to nimble-discuss@googlegroups.com.

License

BSD 3-clause license

Owner

  • Name: SNU Software Platform Lab
  • Login: snuspl
  • Kind: organization

Citation (CITATION)

@incollection{NEURIPS2019_9015,
title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},
author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith},
booktitle = {Advances in Neural Information Processing Systems 32},
editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
pages = {8024--8035},
year = {2019},
publisher = {Curran Associates, Inc.},
url = {http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf}
}

GitHub Events

Total
  • Watch event: 6
Last Year
  • Watch event: 6

Dependencies

ios/LibTorch.podspec cocoapods
  • LibTorch >= 0
ios/TestApp/Podfile cocoapods
  • LibTorch >= 0
.circleci/docker/android/build.gradle maven
  • androidx.appcompat:appcompat 1.0.0 implementation
  • com.android.support:appcompat-v7 28.0.0 implementation
  • com.facebook.fbjni:fbjni-java-only 0.0.3 implementation
  • com.facebook.soloader:nativeloader 0.8.0 implementation
  • com.google.code.findbugs:jsr305 3.0.1 implementation
android/pytorch_android/host/build.gradle maven
  • com.google.code.findbugs:jsr305 3.0.1 compileOnly
  • com.facebook.fbjni:fbjni-java-only 0.0.3 implementation
  • com.facebook.soloader:nativeloader 0.8.0 implementation
  • junit:junit 4.12 testImplementation
android/test_app/app/build.gradle maven
  • com.android.support:appcompat-v7 28.0.0 implementation
  • com.facebook.soloader:nativeloader 0.8.0 implementation
.circleci/ecr_gc_docker/requirements.txt pypi
  • boto3 *
  • pytz *
  • requests *
caffe2/requirements.txt pypi
  • enum34 *
  • numpy *
  • pyyaml *
  • requests *
  • typing *
docs/cpp/requirements.txt pypi
  • breathe ==4.19.2
  • bs4 *
  • exhale ==0.2.3
  • lxml *
  • six *
  • sphinx ==3.1.2
docs/requirements.txt pypi
  • matplotlib *
  • sphinx ==2.4.4
  • sphinxcontrib.katex *
  • tensorboard *
requirements.txt pypi
  • dataclasses *
  • future *
  • numpy *
  • pyyaml *
  • requests *
  • setuptools *
  • six *
  • typing_extensions *
setup.py pypi
  • dataclasses *
  • typing_extensions *
ios/TestApp/Gemfile rubygems
  • fastlane >= 0
ios/TestApp/Gemfile.lock rubygems
  • CFPropertyList 3.0.2
  • addressable 2.7.0
  • atomos 0.1.3
  • babosa 1.0.3
  • claide 1.0.3
  • colored 1.2
  • colored2 3.1.2
  • commander-fastlane 4.4.6
  • declarative 0.0.10
  • declarative-option 0.1.0
  • digest-crc 0.4.1
  • domain_name 0.5.20190701
  • dotenv 2.7.5
  • emoji_regex 1.0.1
  • excon 0.71.1
  • faraday 0.17.3
  • faraday-cookie_jar 0.0.6
  • faraday_middleware 0.13.1
  • fastimage 2.1.7
  • fastlane 2.140.0
  • gh_inspector 1.1.3
  • google-api-client 0.36.4
  • google-cloud-core 1.5.0
  • google-cloud-env 1.3.0
  • google-cloud-errors 1.0.0
  • google-cloud-storage 1.25.1
  • googleauth 0.10.0
  • highline 1.7.10
  • http-cookie 1.0.3
  • httpclient 2.8.3
  • json 2.3.0
  • jwt 2.1.0
  • memoist 0.16.2
  • mini_magick 4.10.1
  • mini_mime 1.0.2
  • multi_json 1.14.1
  • multi_xml 0.6.0
  • multipart-post 2.0.0
  • nanaimo 0.2.6
  • naturally 2.2.0
  • os 1.0.1
  • plist 3.5.0
  • public_suffix 2.0.5
  • representable 3.0.4
  • retriable 3.1.2
  • rouge 2.0.7
  • rubyzip 1.3.0
  • security 0.1.3
  • signet 0.12.0
  • simctl 1.6.7
  • slack-notifier 2.3.2
  • terminal-notifier 2.0.0
  • terminal-table 1.8.0
  • tty-cursor 0.7.0
  • tty-screen 0.7.0
  • tty-spinner 0.9.2
  • uber 0.1.0
  • unf 0.1.4
  • unf_ext 0.0.7.6
  • unicode-display_width 1.6.0
  • word_wrap 1.0.0
  • xcodeproj 1.14.0
  • xcpretty 0.3.0
  • xcpretty-travis-formatter 1.0.0
.github/workflows/clang_format.yml actions
  • actions/checkout v1 composite
  • actions/setup-python v1 composite
.github/workflows/jit_triage.yml actions
  • actions/github-script v2 composite
.github/workflows/lint.yml actions
  • actions/checkout v1 composite
  • actions/setup-python v1 composite
  • pytorch/add-annotations-github-action master composite
  • suo/add-annotations-github-action master composite
.circleci/docker/centos-rocm/Dockerfile docker
  • centos ${CENTOS_VERSION} build
.circleci/docker/ubuntu/Dockerfile docker
  • ubuntu ${UBUNTU_VERSION} build
.circleci/docker/ubuntu-cuda/Dockerfile docker
  • nvidia/cuda ${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION} build
.circleci/docker/ubuntu-rocm/Dockerfile docker
  • ubuntu ${UBUNTU_VERSION} build
.circleci/ecr_gc_docker/Dockerfile docker
  • ubuntu 16.04 build
Dockerfile docker
  • ${BASE_IMAGE} latest build
  • conda latest build
  • dev-base latest build
  • official latest build
caffe2/contrib/docker-ubuntu-14.04/Dockerfile docker
  • ubuntu 14.04 build
docker/caffe2/jenkins/centos/Dockerfile docker
  • centos ${CENTOS_VERSION} build
docker/caffe2/jenkins/centos-cuda/Dockerfile docker
  • nvidia/cuda ${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-centos${CENTOS_VERSION} build
docker/caffe2/jenkins/centos-rocm/Dockerfile docker
  • centos ${CENTOS_VERSION} build
docker/caffe2/jenkins/ubuntu/Dockerfile docker
  • ubuntu ${UBUNTU_VERSION} build
docker/caffe2/jenkins/ubuntu-cuda/Dockerfile docker
  • nvidia/cuda ${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION} build
docker/caffe2/jenkins/ubuntu-rocm/Dockerfile docker
  • ubuntu ${UBUNTU_VERSION} build
docker/caffe2/ubuntu-14.04-cpu-all-options/Dockerfile docker
  • caffe2ai/caffe2 c2v0.8.1.cpu.min.ubuntu14.04 build
docker/caffe2/ubuntu-14.04-cpu-minimal/Dockerfile docker
  • ubuntu 14.04 build
docker/caffe2/ubuntu-16.04-cpu-all-options/Dockerfile docker
  • caffe2ai/caffe2 c2v0.8.1.cpu.min.ubuntu16.04 build
docker/caffe2/ubuntu-16.04-cpu-minimal/Dockerfile docker
  • ubuntu 16.04 build
docker/caffe2/ubuntu-16.04-gpu-tutorial/Dockerfile docker
  • caffe2ai/caffe2 latest build
docker/pytorch/Dockerfile docker
  • nvidia/cuda 10.1-cudnn7-devel-ubuntu16.04 build