https://github.com/beuth-erdelt/prometheus_nvlink_exporter

This script collects some informations about NVLink and PCI bus traffic of NVidia GPUs. Results are published as prometheus metrics via a websocket.

https://github.com/beuth-erdelt/prometheus_nvlink_exporter

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.9%) to scientific vocabulary

Keywords

gpu nvidia-cuda nvidia-docker nvidia-gpu nvlink prometheus prometheus-exporter python
Last synced: 5 months ago · JSON representation

Repository

This script collects some informations about NVLink and PCI bus traffic of NVidia GPUs. Results are published as prometheus metrics via a websocket.

Basic Info
  • Host: GitHub
  • Owner: Beuth-Erdelt
  • License: mit
  • Language: Python
  • Default Branch: master
  • Size: 10.7 KB
Statistics
  • Stars: 6
  • Watchers: 3
  • Forks: 0
  • Open Issues: 4
  • Releases: 0
Topics
gpu nvidia-cuda nvidia-docker nvidia-gpu nvlink prometheus prometheus-exporter python
Created over 6 years ago · Last pushed over 6 years ago
Metadata Files
Readme License

README.md

prometheusnvlinkexporter

This script collects some informations about NVLink and PCI bus traffic of NVidia GPUs. Results are published as prometheus metrics via a websocket.

Usage

We also provide a Docker file. This is based on NVidia's CUDA container, adds a python installation and runs the exporter script.

The basic usage is docker run -d ...

The metrics can be scraped from port 8001.

The docker image is compatible to kubernetes environments.

Prerequisites

The docker image requires docker and NVidia GPUs capable of NVLink and the basic drivers being installed.

The script expects the GPUs to be set via nvidia-smi nvlink -sc 0bz nvidia-smi nvlink -sc 1pz The script uses nvidia-smi and some python libraries, in particular https://github.com/prometheus/client_python

Working examples

Basically the script runs nvidia-smi commands and transforms output to some format that can be scraped by prometheus.

Collecting NVLink Informations

This automatically runs nvidia-smi nvlink -g 0: GPU 0: Tesla V100-SXM2-16GB (UUID: GPU-8dfc570f-9ee4-bdf1-abcd-192837465abc) Link 0: Rx0: 0 KBytes, Tx0: 0 KBytes Link 1: Rx0: 100 KBytes, Tx0: 0 KBytes Link 2: Rx0: 0 KBytes, Tx0: 0 KBytes Link 3: Rx0: 0 KBytes, Tx0: 0 KBytes GPU 1: Tesla V100-SXM2-16GB (UUID: GPU-29123255-8aab-d30e-abcd-192837465abc) Link 0: Rx0: 0 KBytes, Tx0: 0 KBytes Link 1: Rx0: 0 KBytes, Tx0: 0 KBytes Link 2: Rx0: 50 KBytes, Tx0: 0 KBytes Link 3: Rx0: 0 KBytes, Tx0: 0 KBytes GPU 2: Tesla V100-SXM2-16GB (UUID: GPU-7db3a1e6-6150-9c24-abcd-192837465abc) Link 0: Rx0: 0 KBytes, Tx0: 0 KBytes Link 1: Rx0: 0 KBytes, Tx0: 0 KBytes Link 2: Rx0: 0 KBytes, Tx0: 0 KBytes Link 3: Rx0: 0 KBytes, Tx0: 0 KBytes Link 4: Rx0: 0 KBytes, Tx0: 0 KBytes GPU 3: Tesla V100-SXM2-16GB (UUID: GPU-22ea33c7-5a76-9747-abcd-192837465abc) Link 0: Rx0: 0 KBytes, Tx0: 0 KBytes Link 1: Rx0: 0 KBytes, Tx0: 0 KBytes Link 2: Rx0: 0 KBytes, Tx0: 0 KBytes Link 3: Rx0: 0 KBytes, Tx0: 0 KBytes Link 4: Rx0: 0 KBytes, Tx0: 0 KBytes

Collecting PCI Informations

This automatically runs nvidia-smi dmon -s t -c 1 ```

gpu rxpci txpci

Idx MB/s MB/s

1     0     0
2     0     0

```

Publishing Metrics

Output is similar to ```

HELP gpunvlinktx_kbytes Transmitted KBytes via NVLink

TYPE gpunvlinktx_kbytes gauge

gpunvlinktxkbytes{GPUID="0",LinkID="2"} 27598895329.0 gpunvlinktxkbytes{GPUID="0",LinkID="1"} 31602715771.0 gpunvlinktxkbytes{GPUID="4",LinkID="2"} 0.0 gpunvlinktxkbytes{GPUID="7",LinkID="0"} 0.0 gpunvlinktxkbytes{GPUID="4",LinkID="3"} 0.0 gpunvlinktxkbytes{GPUID="5",LinkID="1"} 0.0 gpunvlinktxkbytes{GPUID="0",LinkID="3"} 31602715771.0 gpunvlinktxkbytes{GPUID="5",LinkID="0"} 0.0 gpunvlinktxkbytes{GPUID="7",LinkID="2"} 0.0 gpunvlinktxkbytes{GPUID="2",LinkID="3"} 1019788145.0 gpunvlinktxkbytes{GPUID="7",LinkID="1"} 0.0 gpunvlinktxkbytes{GPUID="3",LinkID="2"} 1017047660.0 gpunvlinktxkbytes{GPUID="2",LinkID="0"} 1014424036.0 gpunvlinktxkbytes{GPUID="2",LinkID="1"} 1017028693.0 gpunvlinktxkbytes{GPUID="1",LinkID="2"} 1017047660.0 gpunvlinktxkbytes{GPUID="6",LinkID="2"} 49.0 gpunvlinktxkbytes{GPUID="5",LinkID="3"} 2986639.0 gpunvlinktxkbytes{GPUID="0",LinkID="0"} 0.0 gpunvlinktxkbytes{GPUID="3",LinkID="3"} 1017028657.0 gpunvlinktxkbytes{GPUID="6",LinkID="1"} 0.0 gpunvlinktxkbytes{GPUID="5",LinkID="2"} 0.0 gpunvlinktxkbytes{GPUID="6",LinkID="0"} 2555441.0 gpunvlinktxkbytes{GPUID="3",LinkID="0"} 1014357462.0 gpunvlinktxkbytes{GPUID="6",LinkID="3"} 0.0 gpunvlinktxkbytes{GPUID="1",LinkID="3"} 0.0 gpunvlinktxkbytes{GPUID="3",LinkID="1"} 0.0 gpunvlinktxkbytes{GPUID="1",LinkID="0"} 1014341346.0 gpunvlinktxkbytes{GPUID="1",LinkID="1"} 5022027981.0 gpunvlinktxkbytes{GPUID="4",LinkID="0"} 0.0 gpunvlinktxkbytes{GPUID="4",LinkID="1"} 0.0 gpunvlinktxkbytes{GPUID="2",LinkID="2"} 4007720847.0 gpunvlinktxkbytes{GPUID="7",LinkID="3"} 0.0

HELP gpupcirxmbper_s Received MBytes per second via PCI

TYPE gpupcirxmbper_s gauge

gpupcirxmbpers{GPUID="2"} 0.0 gpupcirxmbpers{GPUID="5"} 0.0 gpupcirxmbpers{GPUID="7"} 0.0 gpupcirxmbpers{GPUID="3"} 0.0 gpupcirxmbpers{GPUID="6"} 0.0 gpupcirxmbpers{GPUID="4"} 0.0 gpupcirxmbpers{GPUID="0"} 0.0 gpupcirxmbpers{GPUID="1"} 0.0 ```

Owner

  • Name: Berliner Hochschule für Technik (BHT)
  • Login: Beuth-Erdelt
  • Kind: organization
  • Email: patrick.erdelt@bht-berlin.de
  • Location: Germany

Berliner Hochschule für Technik (BHT)

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 4
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 3 days
  • Total issue authors: 1
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • perdelt (4)
Pull Request Authors
  • perdelt (1)
Top Labels
Issue Labels
Pull Request Labels