https://github.com/awslabs/aws-virtual-gpu-device-plugin

AWS virtual gpu device plugin provides capability to use smaller virtual gpus for your machine learning inference workloads

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary

Keywords

gpu kubernetes nvidia

Last synced: 5 months ago · JSON representation

Repository

AWS virtual gpu device plugin provides capability to use smaller virtual gpus for your machine learning inference workloads

Basic Info

Host: GitHub
Owner: awslabs
License: apache-2.0
Language: Jupyter Notebook
Default Branch: master
Homepage: https://aws.amazon.com/blogs/opensource/virtual-gpu-device-plugin-for-inference-workload-in-kubernetes/
Size: 1.43 MB

Statistics

Stars: 204
Watchers: 49
Forks: 31
Open Issues: 15
Releases: 2

Archived

Topics

gpu kubernetes nvidia

Created almost 6 years ago · Last pushed about 2 years ago

Metadata Files

Readme Changelog Contributing License Code of conduct

Virtual GPU device plugin for Kubernetes

The virtual device plugin for Kubernetes is a Daemonset that allows you to automatically: - Expose arbitrary number of virtual GPUs on GPU nodes of your cluster. - Run ML serving containers backed by Accelerator with low latency and low cost in your Kubernetes cluster.

This repository contains AWS virtual GPU implementation of the Kubernetes device plugin.

Prerequisites

The list of prerequisites for running the virtual device plugin is described below: * NVIDIA drivers ~= 361.93 * nvidia-docker version > 2.0 (see how to install and it's prerequisites) * docker configured with nvidia as the default runtime. * Kubernetes version >= 1.10

Limitations

This solution is build on top of Volta Multi-Process Service(MPS). You can only use it on instances types with Tesla-V100 or newer. (Only Amazon EC2 P3 Instances and Amazon EC2 G4 Instances now)
Virtual GPU device plugin by default set GPU compute mode to EXCLUSIVE_PROCESS which means GPU is assigned to MPS process, individual process threads can submit work to GPU concurrently via MPS server. This GPU can not be used for other purpose.
Virtual GPU device plugin only on single physical GPU instance like P3.2xlarge if you request k8s.amazonaws.com/vgpu more than 1 in the workloads.
Virtual GPU device plugin can not work with Nvidia device plugin together. You can label nodes and use selector to install Virtual GPU device plugin.

High Level Design

device-plugin

Quick Start

Label GPU node groups

bash kubectl label node <your_k8s_node_name> k8s.amazonaws.com/accelerator=vgpu

Enabling virtual GPU Support in Kubernetes

Update node selector label in the manifest file to match with labels of your GPU node group, then apply it to Kubernetes.

shell $ kubectl create -f https://raw.githubusercontent.com/awslabs/aws-virtual-gpu-device-plugin/v0.1.1/manifests/device-plugin.yml

Running GPU Jobs

Virtual NVIDIA GPUs can now be consumed via container level resource requirements using the resource name k8s.amazonaws.com/vgpu:

yaml apiVersion: apps/v1 kind: Deployment metadata: name: resnet-deployment spec: replicas: 3 selector: matchLabels: app: resnet-server template: metadata: labels: app: resnet-server spec: # hostIPC is required for MPS communication hostIPC: true containers: - name: resnet-container image: seedjeffwan/tensorflow-serving-gpu:resnet args: # Make sure you set limit based on the vGPU account to avoid tf-serving process occupy all the gpu memory - --per_process_gpu_memory_fraction=0.2 env: - name: MODEL_NAME value: resnet ports: - containerPort: 8501 # Use virtual gpu resource here resources: limits: k8s.amazonaws.com/vgpu: 1 volumeMounts: - name: nvidia-mps mountPath: /tmp/nvidia-mps volumes: - name: nvidia-mps hostPath: path: /tmp/nvidia-mps

WARNING: if you don't request GPUs when using the device plugin all the GPUs on the machine will be exposed inside your container.

Check the full example here

Development

Please check Development for more details.

Credits

The project idea comes from @RenaudWasTaken comment in kubernetes/kubernetes#52757 and Alibaba’s solution from @cheyang GPU Sharing Scheduler Extender Now Supports Fine-Grained Kubernetes Clusters.

Reference

AWS:

28 Nov 2018 - Amazon Elastic Inference – GPU-Powered Deep Learning Inference Acceleration
2 Dec 2018 - Amazon Elastic Inference - Reduce Deep Learning inference costs by 75%
30 JUL 2019 - Running Amazon Elastic Inference Workloads on Amazon ECS
06 SEP 2019 - Optimizing TensorFlow model serving with Kubernetes and Amazon Elastic Inference
03 DEC 2019 - Introducing Amazon EC2 Inf1 Instances, high performance and the lowest cost machine learning inference in the cloud

Community:

License

This project is licensed under the Apache-2.0 License.

Owner

Name: Amazon Web Services - Labs
Login: awslabs
Kind: organization
Location: Seattle, WA

Website: http://amazon.com/aws/
Repositories: 914
Profile: https://github.com/awslabs

AWS Labs

GitHub Events

Total

Watch event: 4
Fork event: 1

Last Year

Watch event: 4
Fork event: 1

Issues and Pull Requests

Last synced: almost 2 years ago

All Time

Total issues: 20
Total pull requests: 14
Average time to close issues: 3 months
Average time to close pull requests: 3 months
Total issue authors: 19
Total pull request authors: 7
Average comments per issue: 1.15
Average comments per pull request: 0.43
Merged pull requests: 10
Bot issues: 0
Bot pull requests: 3

Past Year

Issues: 1
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 2
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 1

View more stats

Top Authors

Issue Authors

walkley (2)
FanniSun (1)
t-ibayashi-safie (1)
vikranthkeerthipati (1)
parth-chudasama (1)
jaggerwang (1)
Narsil (1)
amybachir (1)
Apokleos (1)
stephanrb3 (1)
nneram (1)
valafon (1)
cyyeh (1)
stevensu1977 (1)
josephlee518 (1)

Pull Request Authors

Jeffwan (5)
dependabot[bot] (3)
walkley (2)
parisnakitakejser (2)
Wei-1 (1)
hemandee (1)
linjungz (1)

Top Labels

Issue Labels

Pull Request Labels

dependencies (3)

Packages

Total packages: 1
Total downloads: unknown
Total docker downloads: 469,718

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 2

proxy.golang.org: github.com/awslabs/aws-virtual-gpu-device-plugin

Homepage: https://github.com/awslabs/aws-virtual-gpu-device-plugin
Documentation: https://pkg.go.dev/github.com/awslabs/aws-virtual-gpu-device-plugin#section-documentation
License: Apache-2.0
Latest release: v0.1.1
published almost 6 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0
Docker Downloads: 469,718

Rankings

Docker downloads count: 0.5%

Stargazers count: 4.0%

Forks count: 4.6%

Average: 5.3%

Dependent packages count: 8.2%

Dependent repos count: 9.3%

Last synced: 6 months ago

Dependencies

go.mod go

github.com/NVIDIA/gpu-monitoring-tools v0.0.0-20191011002627-7a750c7e4f8b
github.com/fsnotify/fsnotify v1.4.7
github.com/gogo/protobuf v1.3.0
github.com/golang/protobuf v1.3.2
golang.org/x/net v0.0.0-20190812203447-cdfb69ac37fc
google.golang.org/genproto v0.0.0-20190926190326-7ee9db18f195
google.golang.org/grpc v1.24.0
k8s.io/api v0.0.0
k8s.io/api=>k8s.io/api v0.0.0-20190819141258-3544db3b9e44
k8s.io/apiextensions-apiserver=>k8s.io/apiextensions-apiserver v0.0.0-20190819143637-0dbe462fe92d
k8s.io/apimachinery v0.0.0
k8s.io/apimachinery=>k8s.io/apimachinery v0.0.0-20190817020851-f2f3a405f61d
k8s.io/apiserver=>k8s.io/apiserver v0.0.0-20190819142446-92cc630367d0
k8s.io/cli-runtime=>k8s.io/cli-runtime v0.0.0-20190819144027-541433d7ce35
k8s.io/client-go v0.0.0
k8s.io/client-go=>k8s.io/client-go v0.0.0-20190819141724-e14f31a72a77
k8s.io/cloud-provider=>k8s.io/cloud-provider v0.0.0-20190819145148-d91c85d212d5
k8s.io/cluster-bootstrap=>k8s.io/cluster-bootstrap v0.0.0-20190819145008-029dd04813af
k8s.io/code-generator=>k8s.io/code-generator v0.0.0-20190612205613-18da4a14b22b
k8s.io/component-base=>k8s.io/component-base v0.0.0-20190819141909-f0f7c184477d
k8s.io/cri-api=>k8s.io/cri-api v0.0.0-20190817025403-3ae76f584e79
k8s.io/csi-translation-lib=>k8s.io/csi-translation-lib v0.0.0-20190819145328-4831a4ced492
k8s.io/kube-aggregator=>k8s.io/kube-aggregator v0.0.0-20190819142756-13daafd3604f
k8s.io/kube-controller-manager=>k8s.io/kube-controller-manager v0.0.0-20190819144832-f53437941eef
k8s.io/kube-proxy=>k8s.io/kube-proxy v0.0.0-20190819144346-2e47de1df0f0
k8s.io/kube-scheduler=>k8s.io/kube-scheduler v0.0.0-20190819144657-d1a724e0828e
k8s.io/kubectl=>k8s.io/kubectl v0.0.0-20190602132728-7075c07e78bf
k8s.io/kubelet=>k8s.io/kubelet v0.0.0-20190819144524-827174bad5e8
k8s.io/kubernetes v1.16.0
k8s.io/legacy-cloud-providers=>k8s.io/legacy-cloud-providers v0.0.0-20190819145509-592c9a46fd00
k8s.io/metrics=>k8s.io/metrics v0.0.0-20190819143841-305e1cef1ab1
k8s.io/node-api=>k8s.io/node-api v0.0.0-20190819145652-b61681edbd0a
k8s.io/sample-apiserver=>k8s.io/sample-apiserver v0.0.0-20190819143045-c84c31c165c4
k8s.io/sample-cli-plugin=>k8s.io/sample-cli-plugin v0.0.0-20190819144209-f9ca4b649af0
k8s.io/sample-controller=>k8s.io/sample-controller v0.0.0-20190819143301-7c475f5e1313
k8s.io/utils=>k8s.io/utils v0.0.0-20190221042446-c2654d5206da

go.sum go

552 dependencies

Dockerfile docker

amazonlinux latest build
golang 1.13 build

examples/Dockerfile docker

tensorflow/serving 1.15.0-gpu build

https://github.com/awslabs/aws-virtual-gpu-device-plugin

Science Score: 13.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Virtual GPU device plugin for Kubernetes

Prerequisites

Limitations

High Level Design

Quick Start

Label GPU node groups

Enabling virtual GPU Support in Kubernetes

Running GPU Jobs

Development

Credits

Reference

License

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

proxy.golang.org: github.com/awslabs/aws-virtual-gpu-device-plugin

Rankings

Dependencies