nimble

Lightweight and Parallel Deep Learning Framework

https://github.com/snuspl/nimble

Science Score: 18.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.1%) to scientific vocabulary

Keywords

deep-learning framework gpu-task-scheduling inference parallel training

Last synced: 4 months ago · JSON representation ·

Repository

Lightweight and Parallel Deep Learning Framework

Basic Info

Host: GitHub
Owner: snuspl
License: other
Language: C++
Default Branch: main_pytorch_v1.7.1
Homepage:
Size: 169 MB

Statistics

Stars: 256
Watchers: 9
Forks: 33
Open Issues: 18
Releases: 0

Topics

deep-learning framework gpu-task-scheduling inference parallel training

Created about 5 years ago · Last pushed about 3 years ago

Metadata Files

Readme Contributing License Code of conduct Citation Codeowners

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Nimble is a deep learning execution engine that accelerates model inference and training by running GPU tasks (i.e., GPU kernels and memory operations) in parallel with minimal scheduling overhead. Given a PyTorch DL model, Nimble automatically generates a GPU task schedule, which employs an optimal parallelization strategy for the model. The schedule is wrapped in a Nimble object and can be seamlessly applied to PyTorch programs. Nimble improves the speed of inference and training by up to 22.34× and 3.61× compared to PyTorch, respectively. Moreover, Nimble outperforms TensorRT by up to 2.81×.

Speedup in Inference (ImageNet models)

Inference performance comparison on an NVIDIA V100 GPU.

Speedup in Training (CIFAR-10 models)

| Batch 32 | Batch 64 | Batch 128 | |:---:|:---:|:---:| | | | |

Training performance comparison on an NVIDIA V100 GPU.

Version

This version of Nimble is built on top of PyTorch v1.7.1 with CUDA 11.0. If you want to see the old version of Nimble we used for our experiments in the paper, please checkout to main_pytorch_v1.4.1.

Install Nimble

Please refer to instructions to install Nimble from source.

Use Nimble

Nimble supports both inference and training of neural networks.

Model Inference

```python import torch import torchvision

Instantiate a PyTorch Module and move it to a GPU

model = torchvision.models.resnet50() model = model.cuda() model.eval()

Prepare a dummy input

inputshape = [1, 3, 224, 224] dummyinput = torch.randn(*input_shape).cuda()

Create a Nimble object

nimblemodel = torch.cuda.Nimble(model) nimblemodel.prepare(dummy_input, training=False)

Execute the object

randinput = torch.rand(*inputshape).cuda() output = nimblemodel(randinput) ```

Model Training

```python import torch import torchvision

BATCH = 32

Instantiate a PyTorch Module and move it to a GPU

model = torchvision.models.resnet50(num_classes=10) model = model.cuda() model.train()

Define a loss function and an optimizer

loss_fn = torch.nn.CrossEntropyLoss().cuda() optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

Prepare a dummy input

inputshape = [BATCH, 3, 32, 32] dummyinput = torch.randn(*input_shape).cuda()

Create a Nimble object

nimblemodel = torch.cuda.Nimble(model) nimblemodel.prepare(dummy_input, training=True)

Execute the forward pass

randinput = torch.rand(*inputshape).cuda() output = nimblemodel(randinput)

Compute loss

label = torch.zeros(BATCH, dtype=torch.long).cuda() loss = loss_fn(output, label)

Execute the backward pass

loss.backward()

Perform an optimization step

optimizer.step() ```

Reproduce Evaluation Results

Please refer to evaluation instructions to reproduce the evaluation results.

Publication

Woosuk Kwon, Gyeong-In Yu, Eunji Jeong, and Byung-Gon Chun (* equal contribution), Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning, 34th Conference on Neural Information Processing Systems (NeurIPS), Spotlight, December 2020.

Citation

bibtex @inproceedings{kwon2020nimble, title={Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning}, author={Kwon, Woosuk and Yu, Gyeong-In and Jeong, Eunji and Chun, Byung-Gon}, booktitle={NeurIPS}, year={2020} }

Troubleshooting

Create an issue for questions and bug reports.

Contribution

We welcome your contributions to Nimble! We aim to create an open-source project that is contributed by the open-source community. For general discussions about development, please subscribe to nimble-discuss@googlegroups.com.

License

BSD 3-clause license

Owner

Name: SNU Software Platform Lab
Login: snuspl
Kind: organization

Repositories: 20
Profile: https://github.com/snuspl

Citation (CITATION)

@incollection{NEURIPS2019_9015,
title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},
author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith},
booktitle = {Advances in Neural Information Processing Systems 32},
editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
pages = {8024--8035},
year = {2019},
publisher = {Curran Associates, Inc.},
url = {http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf}
}

GitHub Events

Total

Watch event: 6

Last Year

Watch event: 6

Dependencies

ios/LibTorch.podspec cocoapods

LibTorch >= 0

ios/TestApp/Podfile cocoapods

LibTorch >= 0

.circleci/docker/android/build.gradle maven

androidx.appcompat:appcompat 1.0.0 implementation
com.android.support:appcompat-v7 28.0.0 implementation
com.facebook.fbjni:fbjni-java-only 0.0.3 implementation
com.facebook.soloader:nativeloader 0.8.0 implementation
com.google.code.findbugs:jsr305 3.0.1 implementation

android/pytorch_android/host/build.gradle maven

com.google.code.findbugs:jsr305 3.0.1 compileOnly
com.facebook.fbjni:fbjni-java-only 0.0.3 implementation
com.facebook.soloader:nativeloader 0.8.0 implementation
junit:junit 4.12 testImplementation

android/test_app/app/build.gradle maven

com.android.support:appcompat-v7 28.0.0 implementation
com.facebook.soloader:nativeloader 0.8.0 implementation

.circleci/ecr_gc_docker/requirements.txt pypi

boto3 *
pytz *
requests *

caffe2/requirements.txt pypi

enum34 *
numpy *
pyyaml *
requests *
typing *

docs/cpp/requirements.txt pypi

breathe ==4.19.2
bs4 *
exhale ==0.2.3
lxml *
six *
sphinx ==3.1.2

docs/requirements.txt pypi

matplotlib *
sphinx ==2.4.4
sphinxcontrib.katex *
tensorboard *

requirements.txt pypi

dataclasses *
future *
numpy *
pyyaml *
requests *
setuptools *
six *
typing_extensions *

setup.py pypi

dataclasses *
typing_extensions *

ios/TestApp/Gemfile rubygems

fastlane >= 0

ios/TestApp/Gemfile.lock rubygems

CFPropertyList 3.0.2
addressable 2.7.0
atomos 0.1.3
babosa 1.0.3
claide 1.0.3
colored 1.2
colored2 3.1.2
commander-fastlane 4.4.6
declarative 0.0.10
declarative-option 0.1.0
digest-crc 0.4.1
domain_name 0.5.20190701
dotenv 2.7.5
emoji_regex 1.0.1
excon 0.71.1
faraday 0.17.3
faraday-cookie_jar 0.0.6
faraday_middleware 0.13.1
fastimage 2.1.7
fastlane 2.140.0
gh_inspector 1.1.3
google-api-client 0.36.4
google-cloud-core 1.5.0
google-cloud-env 1.3.0
google-cloud-errors 1.0.0
google-cloud-storage 1.25.1
googleauth 0.10.0
highline 1.7.10
http-cookie 1.0.3
httpclient 2.8.3
json 2.3.0
jwt 2.1.0
memoist 0.16.2
mini_magick 4.10.1
mini_mime 1.0.2
multi_json 1.14.1
multi_xml 0.6.0
multipart-post 2.0.0
nanaimo 0.2.6
naturally 2.2.0
os 1.0.1
plist 3.5.0
public_suffix 2.0.5
representable 3.0.4
retriable 3.1.2
rouge 2.0.7
rubyzip 1.3.0
security 0.1.3
signet 0.12.0
simctl 1.6.7
slack-notifier 2.3.2
terminal-notifier 2.0.0
terminal-table 1.8.0
tty-cursor 0.7.0
tty-screen 0.7.0
tty-spinner 0.9.2
uber 0.1.0
unf 0.1.4
unf_ext 0.0.7.6
unicode-display_width 1.6.0
word_wrap 1.0.0
xcodeproj 1.14.0
xcpretty 0.3.0
xcpretty-travis-formatter 1.0.0

.github/workflows/clang_format.yml actions

actions/checkout v1 composite
actions/setup-python v1 composite

.github/workflows/jit_triage.yml actions

actions/github-script v2 composite

.github/workflows/lint.yml actions

actions/checkout v1 composite
actions/setup-python v1 composite
pytorch/add-annotations-github-action master composite
suo/add-annotations-github-action master composite

.circleci/docker/centos-rocm/Dockerfile docker

centos ${CENTOS_VERSION} build

.circleci/docker/ubuntu/Dockerfile docker

ubuntu ${UBUNTU_VERSION} build

.circleci/docker/ubuntu-cuda/Dockerfile docker

nvidia/cuda ${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION} build

.circleci/docker/ubuntu-rocm/Dockerfile docker

ubuntu ${UBUNTU_VERSION} build

.circleci/ecr_gc_docker/Dockerfile docker

ubuntu 16.04 build

Dockerfile docker

${BASE_IMAGE} latest build
conda latest build
dev-base latest build
official latest build

caffe2/contrib/docker-ubuntu-14.04/Dockerfile docker

ubuntu 14.04 build

docker/caffe2/jenkins/centos/Dockerfile docker

centos ${CENTOS_VERSION} build

docker/caffe2/jenkins/centos-cuda/Dockerfile docker

nvidia/cuda ${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-centos${CENTOS_VERSION} build

docker/caffe2/jenkins/centos-rocm/Dockerfile docker

centos ${CENTOS_VERSION} build

docker/caffe2/jenkins/ubuntu/Dockerfile docker

ubuntu ${UBUNTU_VERSION} build

docker/caffe2/jenkins/ubuntu-cuda/Dockerfile docker

nvidia/cuda ${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION} build

docker/caffe2/jenkins/ubuntu-rocm/Dockerfile docker

ubuntu ${UBUNTU_VERSION} build

docker/caffe2/ubuntu-14.04-cpu-all-options/Dockerfile docker

caffe2ai/caffe2 c2v0.8.1.cpu.min.ubuntu14.04 build

docker/caffe2/ubuntu-14.04-cpu-minimal/Dockerfile docker

ubuntu 14.04 build

docker/caffe2/ubuntu-16.04-cpu-all-options/Dockerfile docker

caffe2ai/caffe2 c2v0.8.1.cpu.min.ubuntu16.04 build

docker/caffe2/ubuntu-16.04-cpu-minimal/Dockerfile docker

ubuntu 16.04 build

docker/caffe2/ubuntu-16.04-gpu-tutorial/Dockerfile docker

caffe2ai/caffe2 latest build

docker/pytorch/Dockerfile docker

nvidia/cuda 10.1-cudnn7-devel-ubuntu16.04 build

nimble

Science Score: 18.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning

Version

Install Nimble

Use Nimble

Model Inference

Instantiate a PyTorch Module and move it to a GPU

Prepare a dummy input

Create a Nimble object

Execute the object

Model Training

Instantiate a PyTorch Module and move it to a GPU

Define a loss function and an optimizer

Prepare a dummy input

Create a Nimble object

Execute the forward pass

Compute loss

Execute the backward pass

Perform an optimization step

Reproduce Evaluation Results

Publication

Citation

Troubleshooting

Contribution

License

Owner

Citation (CITATION)

GitHub Events

Total

Last Year

Dependencies