opencl-benchmark

A small OpenCL benchmark program to measure peak GPU/CPU performance.

https://github.com/projectphysx/opencl-benchmark

Keywords

bandwidth benchmark benchmarking flops gpgpu gpu gpu-computing high-performance-computing hpc opencl tool tools

Last synced: 6 months ago · JSON representation ·

Repository

A small OpenCL benchmark program to measure peak GPU/CPU performance.

Basic Info

Host: GitHub
Owner: ProjectPhysX
License: other
Language: C++
Default Branch: master
Homepage:
Size: 248 KB

Statistics

Stars: 230
Watchers: 7
Forks: 28
Open Issues: 10
Releases: 11

Topics

bandwidth benchmark benchmarking flops gpgpu gpu gpu-computing high-performance-computing hpc opencl tool tools

Created almost 3 years ago · Last pushed 8 months ago

Metadata Files

Readme License Citation

OpenCL-Benchmark

A small OpenCL benchmark program to measure peak GPU/CPU performance.

Works with any GPU in Windows, Linux, macOS and Android.

Measurements

compute performance (FP64 (scalar), FP32 (scalar), FP16 (half2), INT64 (scalar), INT32 (scalar), INT16 (short2), INT8 (dp4a))
- closest possible fraction/multiplicator of measured compute performance divided by reported theoretical FP32 performance is shown in (round brackets)
- for example when OpenCL reports 19.492 TFLOPs/s theoretical FP32, and the benchmark measures 9.512 TFLOPs/s for FP64, the ratio of (measured FP64)/(theoretical FP32) = 9.512/19.492 = 1/2.05 is rounded to the next possible value of 1/2 and reported as such
- these ratios for any GPU/CPU architecture can only be either 1/64, 1/32, 1/24, 1/16, 1/12, 1/8, 1/4, 1/3, 1/2, 2/3, 1x, 2x, 4x, 8x, 16x, 32x, 64x, and nothing in between
memory bandwidth (coalesced/misaligned read/write)
PCIe bandwidth (send/receive/bidirectional)
- PCIe Gen is estimated based on measured PCIe bandwidth and assumed x16 link width

How to use?

Windows

Download and install Visual Studio Community. In Visual Studio Installer, add:
- Desktop development with C++
- MSVC v142
- Windows 10 SDK
Open OpenCL-Benchmark.sln in Visual Studio Community.
Compile and run by clicking the ► Local Windows Debugger button.
To run outside of Visual Studio Community, open Windows CMD in the OpenCL-Benchmark folder (type cmd in File Explorer in the directory field and press Enter), then run OpenCL-Benchmark.exe

Linux / macOS / Android

Download, compile and run: git clone https://github.com/ProjectPhysX/OpenCL-Benchmark.git cd OpenCL-Benchmark chmod +x make.sh ./make.sh
Run bin/OpenCL-Benchmark

Run only for a specified list of devices

call bin\OpenCL-Benchmark.exe 0 2 5 (Windows) or bin/OpenCL-Benchmark 0 2 5 (Linux/macOS) with the number(s) being the device IDs to be benchmarked

Examples

|----------------.------------------------------------------------------------| | Device ID | 0 | | Device Name | NVIDIA H100 80GB HBM3 | | Device Vendor | NVIDIA Corporation | | Device Driver | 565.57.01 (Linux) | | OpenCL Version | OpenCL C 3.0 | | Compute Units | 132 at 1980 MHz (16896 cores, 66.908 TFLOPs/s) | | Memory, Cache | 81105 MB VRAM, 4224 KB global / 48 KB local | | Buffer Limits | 20276 MB global, 64 KB constant | |----------------'------------------------------------------------------------| | Info: OpenCL C code successfully compiled. | | FP64 compute 31.184 TFLOPs/s (1/2 ) | | FP32 compute 62.908 TFLOPs/s ( 1x ) | | FP16 compute 123.749 TFLOPs/s ( 2x ) | | INT64 compute 3.227 TIOPs/s (1/24) | | INT32 compute 32.946 TIOPs/s (1/2 ) | | INT16 compute 30.901 TIOPs/s (1/2 ) | | INT8 compute 103.204 TIOPs/s ( 2x ) | | Memory Bandwidth ( coalesced read ) 3025.53 GB/s | | Memory Bandwidth ( coalesced write) 3055.98 GB/s | | Memory Bandwidth (misaligned read ) 2102.44 GB/s | | Memory Bandwidth (misaligned write) 314.25 GB/s | | PCIe Bandwidth (send ) 10.53 GB/s | | PCIe Bandwidth ( receive ) 11.47 GB/s | | PCIe Bandwidth ( bidirectional) (Gen4 x16) 10.91 GB/s | |-----------------------------------------------------------------------------| |----------------.------------------------------------------------------------| | Device ID | 0 | | Device Name | AMD Instinct MI300X | | Device Vendor | Advanced Micro Devices, Inc. | | Device Driver | 3635.0 (HSA1.1,LC) (Linux) | | OpenCL Version | OpenCL C 2.0 | | Compute Units | 304 at 2100 MHz (19456 cores, 81.715 TFLOPs/s) | | Memory, Cache | 196592 MB VRAM, 32 KB global / 64 KB local | | Buffer Limits | 196592 MB global, 201310208 KB constant | |----------------'------------------------------------------------------------| | Info: OpenCL C code successfully compiled. | | FP64 compute 54.944 TFLOPs/s (2/3 ) | | FP32 compute 130.000 TFLOPs/s ( 2x ) | | FP16 compute 141.320 TFLOPs/s ( 2x ) | | INT64 compute 3.666 TIOPs/s (1/24) | | INT32 compute 47.736 TIOPs/s (2/3 ) | | INT16 compute 69.022 TIOPs/s ( 1x ) | | INT8 compute 106.178 TIOPs/s ( 1x ) | | Memory Bandwidth ( coalesced read ) 3756.64 GB/s | | Memory Bandwidth ( coalesced write) 4686.31 GB/s | | Memory Bandwidth (misaligned read ) 3881.24 GB/s | | Memory Bandwidth (misaligned write) 2491.25 GB/s | | PCIe Bandwidth (send ) 54.57 GB/s | | PCIe Bandwidth ( receive ) 55.79 GB/s | | PCIe Bandwidth ( bidirectional) (Gen4 x16) 55.21 GB/s | |-----------------------------------------------------------------------------| |----------------.------------------------------------------------------------| | Device ID | 0 | | Device Name | Intel(R) Arc(TM) B580 Graphics | | Device Vendor | Intel(R) Corporation | | Device Driver | 32.0.101.6559 (Windows) | | OpenCL Version | OpenCL C 3.0 | | Compute Units | 160 at 2850 MHz (2560 cores, 14.592 TFLOPs/s) | | Memory, Cache | 12187 MB VRAM, 18432 KB global / 128 KB local | | Buffer Limits | 11944 MB global, 12230900 KB constant | |----------------'------------------------------------------------------------| | Info: OpenCL C code successfully compiled. | | FP64 compute 0.896 TFLOPs/s (1/16) | | FP32 compute 14.249 TFLOPs/s ( 1x ) | | FP16 compute 26.547 TFLOPs/s ( 2x ) | | INT64 compute 0.636 TIOPs/s (1/24) | | INT32 compute 4.556 TIOPs/s (1/3 ) | | INT16 compute 37.082 TIOPs/s ( 2x ) | | INT8 compute 48.668 TIOPs/s ( 4x ) | | Memory Bandwidth ( coalesced read ) 574.09 GB/s | | Memory Bandwidth ( coalesced write) 468.07 GB/s | | Memory Bandwidth (misaligned read ) 796.23 GB/s | | Memory Bandwidth (misaligned write) 383.15 GB/s | | PCIe Bandwidth (send ) 4.99 GB/s | | PCIe Bandwidth ( receive ) 4.87 GB/s | | PCIe Bandwidth ( bidirectional) (Gen3 x16) 5.11 GB/s | |-----------------------------------------------------------------------------| |----------------.------------------------------------------------------------| |----------------.------------------------------------------------------------| | Device ID | 0 | | Device Name | AMD EPYC 9554 64-Core Processor | | Device Vendor | Intel(R) Corporation | | Device Driver | 2024.18.10.0.08_160000 (Linux) | | OpenCL Version | OpenCL C 3.0 | | Compute Units | 128 at 0 MHz (64 cores, 0.000 TFLOPs/s) | | Memory, Cache | 386363 MB RAM, 1024 KB global / 256 KB local | | Buffer Limits | 386363 MB global, 128 KB constant | |----------------'------------------------------------------------------------| | Info: OpenCL C code successfully compiled. | | FP64 compute 3.739 TFLOPs/s (1/64) | | FP32 compute 3.842 TFLOPs/s (1/64) | | FP16 compute 0.863 TFLOPs/s (1/64) | | INT64 compute 1.506 TIOPs/s (1/64) | | INT32 compute 4.240 TIOPs/s (1/64) | | INT16 compute 8.592 TIOPs/s (1/64) | | INT8 compute 2.774 TIOPs/s (1/64) | | Memory Bandwidth ( coalesced read ) 391.09 GB/s | | Memory Bandwidth ( coalesced write) 167.26 GB/s | | Memory Bandwidth (misaligned read ) 248.65 GB/s | | Memory Bandwidth (misaligned write) 156.18 GB/s | |-----------------------------------------------------------------------------| |----------------.------------------------------------------------------------| | Device ID | 1 | | Device Name | Intel(R) UHD Graphics 630 | | Device Vendor | Intel(R) Corporation | | Device Driver | 31.0.101.2130 (Windows) | | OpenCL Version | OpenCL C 3.0 | | Compute Units | 24 at 1200 MHz (192 cores, 0.461 TFLOPs/s) | | Memory, Cache | 6500 MB RAM, 768 KB global / 64 KB local | | Buffer Limits | 3250 MB global, 3328048 KB constant | |----------------'------------------------------------------------------------| | Info: OpenCL C code successfully compiled. | | FP64 compute 0.112 TFLOPs/s (1/4 ) | | FP32 compute 0.437 TFLOPs/s ( 1x ) | | FP16 compute 0.801 TFLOPs/s ( 2x ) | | INT64 compute 0.016 TIOPs/s (1/32) | | INT32 compute 0.149 TIOPs/s (1/3 ) | | INT16 compute 0.863 TIOPs/s ( 2x ) | | INT8 compute 0.213 TIOPs/s (1/2 ) | | Memory Bandwidth ( coalesced read ) 20.98 GB/s | | Memory Bandwidth ( coalesced write) 25.18 GB/s | | Memory Bandwidth (misaligned read ) 35.16 GB/s | | Memory Bandwidth (misaligned write) 16.18 GB/s | |-----------------------------------------------------------------------------| |----------------.------------------------------------------------------------| | Device ID | 2 | | Device Name | Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz | | Device Vendor | Intel(R) Corporation | | Device Driver | 2024.17.3.0.08_160000 (Windows) | | OpenCL Version | OpenCL C 3.0 | | Compute Units | 12 at 3700 MHz (6 cores, 0.710 TFLOPs/s) | | Memory, Cache | 16250 MB RAM, 256 KB global / 32 KB local | | Buffer Limits | 16250 MB global, 128 KB constant | |----------------'------------------------------------------------------------| | Info: OpenCL C code successfully compiled. | | FP64 compute 0.151 TFLOPs/s (1/4 ) | | FP32 compute 0.158 TFLOPs/s (1/4 ) | | FP16 compute not supported | | INT64 compute 0.042 TIOPs/s (1/16) | | INT32 compute 0.063 TIOPs/s (1/12) | | INT16 compute 0.224 TIOPs/s (1/3 ) | | INT8 compute 0.059 TIOPs/s (1/12) | | Memory Bandwidth ( coalesced read ) 16.92 GB/s | | Memory Bandwidth ( coalesced write) 8.08 GB/s | | Memory Bandwidth (misaligned read ) 40.02 GB/s | | Memory Bandwidth (misaligned write) 13.69 GB/s | |-----------------------------------------------------------------------------|

Owner

Name: Dr. Moritz Lehmann
Login: ProjectPhysX
Kind: user
Location: Bayreuth, Germany
Company: University of Bayreuth

Twitter: ProjectPhysX
Repositories: 3
Profile: https://github.com/ProjectPhysX

Summa cum laude Physics PhD at age 25 | Graduate @ Elite Net Bavaria | Khronos OpenCL Advisor | FluidX3D GPU developer | DLR_Graduate_Program

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Lehmann"
  given-names: "Moritz"
  orcid: "https://orcid.org/0000-0002-4652-8383"
title: "OpenCL-Benchmark"
date-released: 2023-04-30
url: "https://github.com/ProjectPhysX/OpenCL-Benchmark"

GitHub Events

Total

Create event: 5
Release event: 3
Issues event: 10
Watch event: 68
Delete event: 1
Issue comment event: 24
Push event: 20
Pull request event: 10
Fork event: 8

Last Year

Create event: 5
Release event: 3
Issues event: 10
Watch event: 68
Delete event: 1
Issue comment event: 24
Push event: 20
Pull request event: 10
Fork event: 8

Committers

Last synced: 9 months ago

All Time

Total Commits: 55
Total Committers: 1
Avg Commits per committer: 55.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 32
Committers: 1
Avg Commits per committer: 32.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Dr. Moritz Lehmann	d**n@g**m	55

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 26
Total pull requests: 8
Average time to close issues: 3 days
Average time to close pull requests: 20 days
Total issue authors: 19
Total pull request authors: 3
Average comments per issue: 1.5
Average comments per pull request: 1.5
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 10
Pull requests: 8
Average time to close issues: about 8 hours
Average time to close pull requests: 20 days
Issue authors: 8
Pull request authors: 3
Average comments per issue: 1.3
Average comments per pull request: 1.5
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

sumseq (4)
oscarbg (3)
brokeDude2901 (2)
jungpark-mlir (2)
StuartIanNaylor (1)
jempabroni (1)
xiaoran007 (1)
axet (1)
kouchy (1)
cadenkriese (1)
stolk (1)
DimkaTsv (1)
lorn10 (1)
Snektron (1)
gfkdliucheng (1)

opencl-benchmark

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

OpenCL-Benchmark

Measurements

How to use?

Windows

Linux / macOS / Android

Run only for a specified list of devices

Examples

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels