hello-cuda
A basic "Hello world" or "Hello CUDA" example to perform a number of operations on NVIDIA GPUs using CUDA.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.2%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
·
Repository
A basic "Hello world" or "Hello CUDA" example to perform a number of operations on NVIDIA GPUs using CUDA.
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Created over 2 years ago
· Last pushed 11 months ago
Metadata Files
Readme
License
Citation
README.md
A basic "Hello world" or "Hello CUDA" example to perform a number of operations on NVIDIA GPUs using CUDA.
Note You can just copy
main.shto your system and run it. \ For the code, refer tomain.cu.
```bash $ bash main.sh
Cloning into 'hello-cuda'...
remote: Enumerating objects: 33, done.
remote: Counting objects: 100% (12/12), done.
remote: Compressing objects: 100% (11/11), done.
remote: Total 33 (delta 2), reused 6 (delta 1), pack-reused 21
Receiving objects: 100% (33/33), 24.58 KiB | 719.00 KiB/s, done.
Resolving deltas: 100% (9/9), done.
HELLO WORLD:
GPU[B1.T0]: Hello CUDA
GPU[B1.T1]: Hello CUDA
GPU[B1.T2]: Hello CUDA
GPU[B1.T3]: Hello CUDA
GPU[B1.T4]: Hello CUDA
GPU[B1.T5]: Hello CUDA
GPU[B1.T6]: Hello CUDA
GPU[B1.T7]: Hello CUDA
GPU[B3.T0]: Hello CUDA
GPU[B3.T1]: Hello CUDA
GPU[B3.T2]: Hello CUDA
GPU[B3.T3]: Hello CUDA
GPU[B3.T4]: Hello CUDA
GPU[B3.T5]: Hello CUDA
GPU[B3.T6]: Hello CUDA
GPU[B3.T7]: Hello CUDA
GPU[B2.T0]: Hello CUDA
GPU[B2.T1]: Hello CUDA
GPU[B2.T2]: Hello CUDA
GPU[B2.T3]: Hello CUDA
GPU[B2.T4]: Hello CUDA
GPU[B2.T5]: Hello CUDA
GPU[B2.T6]: Hello CUDA
GPU[B2.T7]: Hello CUDA
GPU[B0.T0]: Hello CUDA
GPU[B0.T1]: Hello CUDA
GPU[B0.T2]: Hello CUDA
GPU[B0.T3]: Hello CUDA
GPU[B0.T4]: Hello CUDA
GPU[B0.T5]: Hello CUDA
GPU[B0.T6]: Hello CUDA
GPU[B0.T7]: Hello CUDA
CPU: Hello world!
DEVICE PROPERTIES:
COMPUTE DEVICE 0:
Name: Tesla V100-PCIE-16GB
Compute capability: 7.0
Multiprocessors: 80
Clock rate: 1380 MHz
Global memory: 16151 MB
Constant memory: 64 KB
Shared memory per block: 48 KB
Registers per block: 65536
Threads per block: 1024 (max)
Threads per warp: 32
Block dimension: 1024x1024x64 (max)
Grid dimension: 2147483647x65535x65535 (max)
Device copy overlap: yes
Kernel execution timeout: no
CHOOSE DEVICE:
Current CUDA device: 0
CUDA device with atleast compute capability 1.3: 0
Cards that have compute capability 1.3 or higher
support double-precision floating-point math.
MALLOC PERFORMANCE:
Host malloc (1 GB): 0.00 ms
CUDA malloc (1 GB): 1.35 ms
Host free (1 GB): 0.00 ms
CUDA free (1 GB): 1.51 ms
MEMCPY PERFORMANCE:
Host to host (1 GB): 412.59 ms
Host to device (1 GB): 225.32 ms
Device to host (1 GB): 246.87 ms
Device to device (1 GB): 0.04 ms
ADDITION:
a = 1, b = 2
a + b = 3 (GPU)
VECTOR ADDITION:
x = vector of size 1 GB
y = vector of size 1 GB
Vector addition on host (a = x + y): 438.02 ms
Vector addition on device <<<32768, 32>>> (a = x + y): 4.33 ms
Vector addition on device <<<16384, 64>>> (a = x + y): 3.98 ms
Vector addition on device <<<8192, 128>>> (a = x + y): 4.01 ms
Vector addition on device <<<4096, 256>>> (a = x + y): 3.97 ms
Vector addition on device <<<2048, 512>>> (a = x + y): 4.00 ms
Vector addition on device <<<1024, 1024>>> (a = x + y): 3.97 ms
DOT PRODUCT:
x = vector of size 1 GB
y = vector of size 1 GB
Dot product on host (a = x . y): 207.39 ms [2.154769e+05]
Dot product on device (a = x . y): 2.69 ms 2.154769e+05
Dot product on device (a = x . y): 2.50 ms 2.154769e+05
Dot product on device (a = x . y): 2.50 ms 2.154769e+05
HISTOGRAM:
buf = vector of size 1 GB
Finding histogram of buf on host: 747.00 ms
Finding histogram of buf on device (basic approach): 401.06 ms
Finding histogram of buf on device (shared approach): 6.85 ms
MATRIX MULTIPLICATION:
x = matrix of size 16 MB
y = matrix of size 16 MB
Matrix multiplication on host (a = x * y): 33307.13 ms [3.287916e+00]
Matrix multiplication on device (a = x * y): 18.93 ms (basic approach) [3.287916e+00]
Matrix multiplication on device (a = x * y): 12.20 ms (tiled approach) [3.287916e+00]
```
References
- CUB Documentation
- moderngpu/moderngpu: Patterns and behaviors for GPU computing
- Faster Parallel Reductions on Kepler
- CUDA atomicAdd for doubles definition error
- CUDA C++ Programming Guide
- CUDA Toolkit Documentation
Owner
- Name: puzzlef
- Login: puzzlef
- Kind: organization
- Website: https://puzzlef.github.io/
- Repositories: 10
- Profile: https://github.com/puzzlef
A summary of experiments.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Sahu
given-names: Subhajit
orcid: https://orcid.org/0000-0001-5140-6578
title: "puzzlef/hello-cuda: A basic \"Hello world\" or \"Hello CUDA\" example to perform a number of operations on NVIDIA GPUs using CUDA"
version: 1.0.0
doi: 10.5281/zenodo.10030459
date-released: 2023-10-22
GitHub Events
Total
- Push event: 1
Last Year
- Push event: 1
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
