https://github.com/amd/amd_smi_exporter
The AMD SMI Exporter exports AMD EPYC CPU & Datacenter GPU metrics to the Prometheus server.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Repository
The AMD SMI Exporter exports AMD EPYC CPU & Datacenter GPU metrics to the Prometheus server.
Basic Info
Statistics
- Stars: 55
- Watchers: 7
- Forks: 10
- Open Issues: 9
- Releases: 0
Metadata Files
README.md
AMD SMI Prometheus Exporter
The AMD SMI Exporter is a standalone app that can be run as a daemon, written in GO Language, that exports AMD CPU & GPU metrics to the Prometheus server. The AMD SMI Prometheus Exporter employs * AMDSMI Library for its data acquisition and * GO binding that provides an interface between the amdsmi and the GO exporter code.
Note:
This AMD SMI Exporter repository will no longer receive further updates. Moving forward, the AMD Device Metrics Exporter repository will serve as the official replacement. Please transition to AMD Device Metrics Exporter for continued improvements, updates, and support.
Important note about Versioning and Backward Compatibility
The AMD SMI Exporter follows the AMDSMI library in its releases, as it is dependent on the underlying libraries for its data. The Exporter is currently under development, and therefore subject to change in the features it offers and at the interface with the GO binding.
While every effort will be made to ensure stable backward compatibility in software releases with a major version greater than 0, any code/interface may be subject to revision/change while the major version remains 0.
Building the exporter
The standalone GO Exporter may be built from the src directory as follows:
Downloading the source
The source code for the GO Exporter is AMD SMI Exporter.
Directory stucture of the source
Once the exporter source has been cloned to a local Linux machine, the directory structure of
source is as below:
* $ src/ Contains exporter source for package main
* $ src/collect/ Contains the implementation of the Scan function of the collector.
* $ grafana/ Contains the JSON files for Grafana dashboard.
Change the directory to amdsmiexporter/src
$ cd amd_smi_exporter/srcExecute "make clean" to clean pre-existing binaries and GO module files
amd_smi_exporter/src$ make cleanExecute "make" to perform a "go get" of dependent modules such as
- github.com/prometheus/client_golang
- github.com/prometheus/client_golang/prometheus
- github.com/prometheus/client_golang/prometheus/promhttp
- github.com/ROCm/amdsmi
amd_smi_exporter/src$ make
The aforementioned steps will create the "amdsmiexporter" GO binary file. To install the binary in /usr/local/bin, and install the service file in /etc/systemd/system directory, one may execute:
```$ sudo make install```
Building the container for the GO Exporter
Once the GO Exporter is built, one may proceed to create a containerized micro service of the go executable by executing the following commands:
Prerequisite: docker version 20.10.12 or later must be installed on the build server for the container build to succeed.
Execute "make container_clean" to clean pre-existing images and configuration of the container image.
amd_smi_exporter/src$ make container_cleanBuild the fresh container image with the following command:
amd_smi_exporter/src$ make container
This command will build the container image and will be listed when the user issues the
sudo docker images command.
A tarball of the container image file "k8/amdsmiexporter_container.tar" is also saved in
the "k8" directory, and this may be used to deploy the container manually on respective
nodes of the kubernetes cluster using the "k8/daemonset.yaml" file.
Grafana Dashboard:
JSON files for Grafana dashboard are available under grafana/ of this repo * AMDSmiExporterCPUGrafanaDashboard.json * AMDSmiExporterGPUGrafanaDashboard.json
Dependencies
Please ensure the following are in place 1. amdsmi library with goamdsmi_shim bindings installed under "/opt/rocm" 2. GO v1.20 3. Docker (tested on v20.10.12 or later)
GO Installation:
To run on AMD rocm dockers, GO installation through apt install on Linux is only supported till 1.18. Manual installation can be done from here: https://go.dev/dl/ Below is an example of installing 1.20.12 of go.
$ wget -L "https://golang.org/dl/go1.20.12.linux-amd64.tar.gz"
$ tar -xf "go1.20.12.linux-amd64.tar.gz"
$ cd go/
$ ls -l
$ cd ..
$ sudo chown -R root:root ./go
$ sudo mv -v go /usr/local
$ export GOPATH=$HOME/go
$ export PATH=$PATH:/usr/local/go/bin:$GOPATH/bin
Add amdsmi library path to LDLIBRARYPATH environment variable and export.
$ export LD_LIBRARY_PATH=<path_to_amdsmi_library>
Running the GO Exporter
NOTE: Only one instance of the GO Exporter may be run on the server, either as a standalone service, or as a containerized micro service (started with "docker run" or as a daemonSet of a kubernetes deployment).
Prerequisite: To ensure that AMD custom parameters defined in the amd-smi-custom-rules.yml file are found in the promql queries, add the following rulefiles and scrapeconfigs to the /etc/prometheus/prometheus.yml file:
rule_files: - "amd-smi-custom-rules.yml"
scrapeconfigs: - jobname: "prometheus" - jobname: "amd-smi-exporter" staticconfigs: - targets: ["localhost:2021"]
Custom rules
The prometheus query language allows the user to customize his queries based on user requirements. The customizations may be added to the /usr/local/bin/prometheus/amd-smi-custom-rules.yml file". Here are a few sample queries that may be built over the aforementioned objects:
amdcoreenergy{thread="101"}/1000000
Displays the core energy of core 101 shifted by six decimal points.
amdsocketpower/100 > 650.00
Rule to check if socket power consumption has gone over 650.00
amdprochotstatus != 0
Alert to check if PROC_HOT status has been triggered
Executing the Go Exporter
The GO exporter may be run manually in the following ways
1. Executing the "amdsmiexporter" GO binary:
```amd_smi_exporter/src$ ./amd_smi_exporter```
2. As a systemd daemon:
```$ sudo systemctl daemon-reload```
```$ sudo service prometheus restart```
```$ sudo service amd-smi-exporter start```
3. As a containerized micro service that may be started manually or as a kubernetes daemonSet:
Assuming user has a running docker daemon and a kubernetes cluster.
On a server node that is not a part of a kubernetes cluster, one may execute the following command:
```$ sudo docker run -d --name amd-exporter --device=/dev/cpu --device=/dev/kfd
--device=/dev/dri --privileged -p 2021:2021 amd_smi_exporter_container:0.1```
Alternatively, the docker image tarball of the container may be copied to individual kubernetes cluster node and loaded on the worker node. The daemonSet may then be applied from the master node as follows:
On the worker node, copy the amdsmiexporter_container.tar image file and execute:
```$ sudo docker load -i amd_smi_exporter_container.tar```
On the master node, copy the daemonset.yaml file and execute:
```$ kubectl apply -f daemonset.yaml```
This will deploy a single running instance of the AMD SMI Exporter container micro
service on the worker nodes of the kubernetes cluster. The daemonset.yaml file may
be edited to apply taints for nodes where the exporter is not expected to run in
the cluster.
Supported hardware
AMD EPYC TM line of server CPU Families:
- AMD CPU Family
19hModels0h-Fh(Milan),10h-1Fh(Genoa),A0h-AFh. - AMD CPU Family
1AhModels0h-Fh(Turin),10h-1Fh. - AMD APU Family
19hModels90h-9fhand - AMD GPUs MI200 and MI300.
Examples
CPU core metrics
1. amdcoreenergy
### Description: Displays the per-core energy consumption of the processor so far.
This object may be queried at the core level or the thread level. The values reported by the threads in a hyperthreaded core will be the same. This object query will report the energy counter values for all threads. To query a single thread (lets say the thread number is 101), the user may use the following query:
amd_core_energy{thread="101"}
### Type: Counter
### Property: Read-only
2. amdboostlimit
### Description: Displays the per-core boost limit that the core is operating at.
### Type: Gauge
### Property: Read-only
CPU Socket metrics
3. amdsocketenergy
### Description: Displays the per-socket cumulative energy consumed by all cores
so far. This value excludes the energy consumed by the AID (Active Interposer Die).To query a single socket (lets say socket 2), the user may use the following query:
amd_socket_energy{socket="2"}
### Type: Counter
### Property: Read-only
4. amdsocketpower
### Description: Displays the per-socket power consumed. This is a real time gauge
value that is queried at a time interval set by the scrape interval. ### Type: Gauge ### Property: Read-only
5. amdpowerlimit
### Description: Displays the power limit at which the processor is operating at.
### Type: Gauge
### Property: Read-only
6. amdprochotstatus
### Description: Displays a binary value of "0" or "1", where "1" implies that the
PROC_HOT status of the processor has been triggered. ### Type: Gauge ### Property: Read-only
System
7. amdnumsockets
### Description: Displays the number of sockets which the processor is seated in.
### Type: Gauge
### Property: Read-only
8. amdnumthreads
### Description: Displays the total number of threads (logical CPUs) in all.
### Type: Gauge
### Property: Read-only
9. amdnumthreadspercore
### Description: Displays the number of threads (logical CPUs) per core.
### Type: Gauge
### Property: Read-only
GPU Metrics
10. amdnumgpus
### Description: Displays the number of gpus
### Type: Gauge
### Property: Read-only
11. amdgpudev_id
### Description: Displays the dev id of the gpu
### Type: Gauge
### Property: Read-only
12. amdgpupower_cap
### Description: Displays the gpu power cap
### Type: Gauge
### Property: Read-only
13. amdgpupower_avg
### Description: Displays the gpu average power consumed
### Type: Counter
### Property: Read-only
14. amdgpucurrent_temperature
### Description: Displays the current temperature of the gpu
### Type: Gauge
### Property: Read-only
15. amdgpuSCLK
### Description: Displays the GPU SCLK frequency
### Type: Gauge
### Property: Read-only
16. amdgpuMCLK
### Description: Displays the GPU MCLK frequency
### Type: Gauge
### Property: Read-only
17. amdgpuUsage
### Description: Displays the GPU Use percent
### Type: Gauge
### Property: Read-only
18. amdgpumemory_busy percent
### Description: Displays the GPU Memory busy percent
### Type: Gauge
### Property: Read-only
FAQs:
If the prometheus service fails to start properly, run the command
journalctl -u prometheus -f --no-pagerIf an issue is related to "Web lister busy" or "Port is already in use", Please change Port from 9090 to 9091 in the following files
- /etc/systemd/system/prometheus.service file
- under line "--web.listen-address=0.0.0.0:9090"
- /etc/prometheus/prometheus.yml file
- under line "targets: ["localhost:9090"]
and restart the systemd service using command "service prometheus restart".
- /etc/systemd/system/prometheus.service file
Owner
- Name: AMD
- Login: amd
- Kind: organization
- Email: dl.DevSecOps-Github-Admin@amd.com
- Website: http://www.amd.com
- Repositories: 56
- Profile: https://github.com/amd
GitHub Events
Total
- Issues event: 3
- Watch event: 16
- Issue comment event: 24
- Push event: 4
- Pull request event: 4
- Fork event: 3
Last Year
- Issues event: 3
- Watch event: 16
- Issue comment event: 24
- Push event: 4
- Pull request event: 4
- Fork event: 3
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| muthusamy | m****m@a****m | 16 |
| krishc | K****C@a****m | 10 |
| Muralidhara M K | m****k@a****m | 4 |
| Naveen Krishna Chatradhi | n****d@a****m | 3 |
| Jorge Parada | j****a@a****m | 3 |
| Vicky Tsang | v****g@a****m | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 11
- Total pull requests: 9
- Average time to close issues: 4 months
- Average time to close pull requests: 16 days
- Total issue authors: 9
- Total pull request authors: 5
- Average comments per issue: 2.73
- Average comments per pull request: 0.22
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 8
- Pull requests: 3
- Average time to close issues: about 10 hours
- Average time to close pull requests: 3 days
- Issue authors: 6
- Pull request authors: 3
- Average comments per issue: 3.25
- Average comments per pull request: 0.33
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- yx-lamini (3)
- jaeyung2 (1)
- liangxm-0323 (1)
- themoneyevo (1)
- hvp4 (1)
- Rohith-Scalers (1)
- krishh85 (1)
- GowriShankarEAAS (1)
- lddlww (1)
Pull Request Authors
- japarada (3)
- muralimk-amd (3)
- vickytsang (3)
- MuthusamyRamalingam (2)
- muthusAMD (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 2
proxy.golang.org: github.com/amd/amd_smi_exporter
- Homepage: https://github.com/amd/amd_smi_exporter
- Documentation: https://pkg.go.dev/github.com/amd/amd_smi_exporter#section-documentation
-
Latest release: v2.0.0+incompatible
published over 1 year ago
Rankings
Dependencies
- ubuntu 20.04 build