Science Score: 18.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.1%) to scientific vocabulary
Keywords
Repository
Lightweight and Parallel Deep Learning Framework
Basic Info
Statistics
- Stars: 256
- Watchers: 9
- Forks: 33
- Open Issues: 18
- Releases: 0
Topics
Metadata Files
README.md
Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning
Nimble is a deep learning execution engine that accelerates model inference and training by running GPU tasks (i.e., GPU kernels and memory operations) in parallel with minimal scheduling overhead.
Given a PyTorch DL model, Nimble automatically generates a GPU task schedule, which employs an optimal parallelization strategy for the model.
The schedule is wrapped in a Nimble object and can be seamlessly applied to PyTorch programs.
Nimble improves the speed of inference and training by up to 22.34× and 3.61× compared to PyTorch, respectively. Moreover, Nimble outperforms TensorRT by up to 2.81×.
- Speedup in Inference (ImageNet models)
Inference performance comparison on an NVIDIA V100 GPU.
- Speedup in Training (CIFAR-10 models)
| Batch 32 | Batch 64 | Batch 128 |
|:---:|:---:|:---:|
|
|
|
|
Training performance comparison on an NVIDIA V100 GPU.
Version
This version of Nimble is built on top of PyTorch v1.7.1 with CUDA 11.0. If you want to see the old version of Nimble we used for our experiments in the paper, please checkout to main_pytorch_v1.4.1.
Install Nimble
Please refer to instructions to install Nimble from source.
Use Nimble
Nimble supports both inference and training of neural networks.
Model Inference
```python import torch import torchvision
Instantiate a PyTorch Module and move it to a GPU
model = torchvision.models.resnet50() model = model.cuda() model.eval()
Prepare a dummy input
inputshape = [1, 3, 224, 224] dummyinput = torch.randn(*input_shape).cuda()
Create a Nimble object
nimblemodel = torch.cuda.Nimble(model) nimblemodel.prepare(dummy_input, training=False)
Execute the object
randinput = torch.rand(*inputshape).cuda() output = nimblemodel(randinput) ```
Model Training
```python import torch import torchvision
BATCH = 32
Instantiate a PyTorch Module and move it to a GPU
model = torchvision.models.resnet50(num_classes=10) model = model.cuda() model.train()
Define a loss function and an optimizer
loss_fn = torch.nn.CrossEntropyLoss().cuda() optimizer = torch.optim.SGD(model.parameters(), lr=0.1)
Prepare a dummy input
inputshape = [BATCH, 3, 32, 32] dummyinput = torch.randn(*input_shape).cuda()
Create a Nimble object
nimblemodel = torch.cuda.Nimble(model) nimblemodel.prepare(dummy_input, training=True)
Execute the forward pass
randinput = torch.rand(*inputshape).cuda() output = nimblemodel(randinput)
Compute loss
label = torch.zeros(BATCH, dtype=torch.long).cuda() loss = loss_fn(output, label)
Execute the backward pass
loss.backward()
Perform an optimization step
optimizer.step() ```
Reproduce Evaluation Results
Please refer to evaluation instructions to reproduce the evaluation results.
Publication
Woosuk Kwon, Gyeong-In Yu, Eunji Jeong, and Byung-Gon Chun (* equal contribution), Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning, 34th Conference on Neural Information Processing Systems (NeurIPS), Spotlight, December 2020.
Citation
bibtex
@inproceedings{kwon2020nimble,
title={Nimble: Lightweight and Parallel GPU Task Scheduling for Deep Learning},
author={Kwon, Woosuk and Yu, Gyeong-In and Jeong, Eunji and Chun, Byung-Gon},
booktitle={NeurIPS},
year={2020}
}
Troubleshooting
Create an issue for questions and bug reports.
Contribution
We welcome your contributions to Nimble! We aim to create an open-source project that is contributed by the open-source community. For general discussions about development, please subscribe to nimble-discuss@googlegroups.com.
License
Owner
- Name: SNU Software Platform Lab
- Login: snuspl
- Kind: organization
- Repositories: 20
- Profile: https://github.com/snuspl
Citation (CITATION)
@incollection{NEURIPS2019_9015,
title = {PyTorch: An Imperative Style, High-Performance Deep Learning Library},
author = {Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu and Bai, Junjie and Chintala, Soumith},
booktitle = {Advances in Neural Information Processing Systems 32},
editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
pages = {8024--8035},
year = {2019},
publisher = {Curran Associates, Inc.},
url = {http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf}
}
GitHub Events
Total
- Watch event: 6
Last Year
- Watch event: 6
Dependencies
- LibTorch >= 0
- LibTorch >= 0
- androidx.appcompat:appcompat 1.0.0 implementation
- com.android.support:appcompat-v7 28.0.0 implementation
- com.facebook.fbjni:fbjni-java-only 0.0.3 implementation
- com.facebook.soloader:nativeloader 0.8.0 implementation
- com.google.code.findbugs:jsr305 3.0.1 implementation
- com.google.code.findbugs:jsr305 3.0.1 compileOnly
- com.facebook.fbjni:fbjni-java-only 0.0.3 implementation
- com.facebook.soloader:nativeloader 0.8.0 implementation
- junit:junit 4.12 testImplementation
- com.android.support:appcompat-v7 28.0.0 implementation
- com.facebook.soloader:nativeloader 0.8.0 implementation
- boto3 *
- pytz *
- requests *
- enum34 *
- numpy *
- pyyaml *
- requests *
- typing *
- breathe ==4.19.2
- bs4 *
- exhale ==0.2.3
- lxml *
- six *
- sphinx ==3.1.2
- matplotlib *
- sphinx ==2.4.4
- sphinxcontrib.katex *
- tensorboard *
- dataclasses *
- future *
- numpy *
- pyyaml *
- requests *
- setuptools *
- six *
- typing_extensions *
- dataclasses *
- typing_extensions *
- fastlane >= 0
- CFPropertyList 3.0.2
- addressable 2.7.0
- atomos 0.1.3
- babosa 1.0.3
- claide 1.0.3
- colored 1.2
- colored2 3.1.2
- commander-fastlane 4.4.6
- declarative 0.0.10
- declarative-option 0.1.0
- digest-crc 0.4.1
- domain_name 0.5.20190701
- dotenv 2.7.5
- emoji_regex 1.0.1
- excon 0.71.1
- faraday 0.17.3
- faraday-cookie_jar 0.0.6
- faraday_middleware 0.13.1
- fastimage 2.1.7
- fastlane 2.140.0
- gh_inspector 1.1.3
- google-api-client 0.36.4
- google-cloud-core 1.5.0
- google-cloud-env 1.3.0
- google-cloud-errors 1.0.0
- google-cloud-storage 1.25.1
- googleauth 0.10.0
- highline 1.7.10
- http-cookie 1.0.3
- httpclient 2.8.3
- json 2.3.0
- jwt 2.1.0
- memoist 0.16.2
- mini_magick 4.10.1
- mini_mime 1.0.2
- multi_json 1.14.1
- multi_xml 0.6.0
- multipart-post 2.0.0
- nanaimo 0.2.6
- naturally 2.2.0
- os 1.0.1
- plist 3.5.0
- public_suffix 2.0.5
- representable 3.0.4
- retriable 3.1.2
- rouge 2.0.7
- rubyzip 1.3.0
- security 0.1.3
- signet 0.12.0
- simctl 1.6.7
- slack-notifier 2.3.2
- terminal-notifier 2.0.0
- terminal-table 1.8.0
- tty-cursor 0.7.0
- tty-screen 0.7.0
- tty-spinner 0.9.2
- uber 0.1.0
- unf 0.1.4
- unf_ext 0.0.7.6
- unicode-display_width 1.6.0
- word_wrap 1.0.0
- xcodeproj 1.14.0
- xcpretty 0.3.0
- xcpretty-travis-formatter 1.0.0
- actions/checkout v1 composite
- actions/setup-python v1 composite
- actions/github-script v2 composite
- actions/checkout v1 composite
- actions/setup-python v1 composite
- pytorch/add-annotations-github-action master composite
- suo/add-annotations-github-action master composite
- centos ${CENTOS_VERSION} build
- ubuntu ${UBUNTU_VERSION} build
- nvidia/cuda ${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION} build
- ubuntu ${UBUNTU_VERSION} build
- ubuntu 16.04 build
- ${BASE_IMAGE} latest build
- conda latest build
- dev-base latest build
- official latest build
- ubuntu 14.04 build
- centos ${CENTOS_VERSION} build
- nvidia/cuda ${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-centos${CENTOS_VERSION} build
- centos ${CENTOS_VERSION} build
- ubuntu ${UBUNTU_VERSION} build
- nvidia/cuda ${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${UBUNTU_VERSION} build
- ubuntu ${UBUNTU_VERSION} build
- caffe2ai/caffe2 c2v0.8.1.cpu.min.ubuntu14.04 build
- ubuntu 14.04 build
- caffe2ai/caffe2 c2v0.8.1.cpu.min.ubuntu16.04 build
- ubuntu 16.04 build
- caffe2ai/caffe2 latest build
- nvidia/cuda 10.1-cudnn7-devel-ubuntu16.04 build