hello-cuda

A basic "Hello world" or "Hello CUDA" example to perform a number of operations on NVIDIA GPUs using CUDA.

https://github.com/puzzlef/hello-cuda

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.2%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

A basic "Hello world" or "Hello CUDA" example to perform a number of operations on NVIDIA GPUs using CUDA.

Basic Info

Host: GitHub
Owner: puzzlef
License: mit
Language: Cuda
Default Branch: main
Homepage:
Size: 25.4 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Created almost 3 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

README.md

A basic "Hello world" or "Hello CUDA" example to perform a number of operations on NVIDIA GPUs using CUDA.

Note You can just copy main.sh to your system and run it. \ For the code, refer to main.cu.

```bash $ bash main.sh

Cloning into 'hello-cuda'...

remote: Enumerating objects: 33, done.

remote: Counting objects: 100% (12/12), done.

remote: Compressing objects: 100% (11/11), done.

remote: Total 33 (delta 2), reused 6 (delta 1), pack-reused 21

Receiving objects: 100% (33/33), 24.58 KiB | 719.00 KiB/s, done.

Resolving deltas: 100% (9/9), done.

HELLO WORLD:

GPU[B1.T0]: Hello CUDA

GPU[B1.T1]: Hello CUDA

GPU[B1.T2]: Hello CUDA

GPU[B1.T3]: Hello CUDA

GPU[B1.T4]: Hello CUDA

GPU[B1.T5]: Hello CUDA

GPU[B1.T6]: Hello CUDA

GPU[B1.T7]: Hello CUDA

GPU[B3.T0]: Hello CUDA

GPU[B3.T1]: Hello CUDA

GPU[B3.T2]: Hello CUDA

GPU[B3.T3]: Hello CUDA

GPU[B3.T4]: Hello CUDA

GPU[B3.T5]: Hello CUDA

GPU[B3.T6]: Hello CUDA

GPU[B3.T7]: Hello CUDA

GPU[B2.T0]: Hello CUDA

GPU[B2.T1]: Hello CUDA

GPU[B2.T2]: Hello CUDA

GPU[B2.T3]: Hello CUDA

GPU[B2.T4]: Hello CUDA

GPU[B2.T5]: Hello CUDA

GPU[B2.T6]: Hello CUDA

GPU[B2.T7]: Hello CUDA

GPU[B0.T0]: Hello CUDA

GPU[B0.T1]: Hello CUDA

GPU[B0.T2]: Hello CUDA

GPU[B0.T3]: Hello CUDA

GPU[B0.T4]: Hello CUDA

GPU[B0.T5]: Hello CUDA

GPU[B0.T6]: Hello CUDA

GPU[B0.T7]: Hello CUDA

CPU: Hello world!

DEVICE PROPERTIES:

COMPUTE DEVICE 0:

Name: Tesla V100-PCIE-16GB

Compute capability: 7.0

Multiprocessors: 80

Clock rate: 1380 MHz

Global memory: 16151 MB

Constant memory: 64 KB

Shared memory per block: 48 KB

Registers per block: 65536

Threads per block: 1024 (max)

Threads per warp: 32

Block dimension: 1024x1024x64 (max)

Grid dimension: 2147483647x65535x65535 (max)

Device copy overlap: yes

Kernel execution timeout: no

CHOOSE DEVICE:

Current CUDA device: 0

CUDA device with atleast compute capability 1.3: 0

Cards that have compute capability 1.3 or higher

support double-precision floating-point math.

MALLOC PERFORMANCE:

Host malloc (1 GB): 0.00 ms

CUDA malloc (1 GB): 1.35 ms

Host free (1 GB): 0.00 ms

CUDA free (1 GB): 1.51 ms

MEMCPY PERFORMANCE:

Host to host (1 GB): 412.59 ms

Host to device (1 GB): 225.32 ms

Device to host (1 GB): 246.87 ms

Device to device (1 GB): 0.04 ms

ADDITION:

a = 1, b = 2

a + b = 3 (GPU)

VECTOR ADDITION:

x = vector of size 1 GB

y = vector of size 1 GB

Vector addition on host (a = x + y): 438.02 ms

Vector addition on device <<<32768, 32>>> (a = x + y): 4.33 ms

Vector addition on device <<<16384, 64>>> (a = x + y): 3.98 ms

Vector addition on device <<<8192, 128>>> (a = x + y): 4.01 ms

Vector addition on device <<<4096, 256>>> (a = x + y): 3.97 ms

Vector addition on device <<<2048, 512>>> (a = x + y): 4.00 ms

Vector addition on device <<<1024, 1024>>> (a = x + y): 3.97 ms

DOT PRODUCT:

x = vector of size 1 GB

y = vector of size 1 GB

Dot product on host (a = x . y): 207.39 ms [2.154769e+05]

Dot product on device (a = x . y): 2.69 ms 2.154769e+05

Dot product on device (a = x . y): 2.50 ms 2.154769e+05

HISTOGRAM:

buf = vector of size 1 GB

Finding histogram of buf on host: 747.00 ms

Finding histogram of buf on device (basic approach): 401.06 ms

Finding histogram of buf on device (shared approach): 6.85 ms

MATRIX MULTIPLICATION:

x = matrix of size 16 MB

y = matrix of size 16 MB

Matrix multiplication on host (a = x * y): 33307.13 ms [3.287916e+00]

Matrix multiplication on device (a = x * y): 18.93 ms (basic approach) [3.287916e+00]

Matrix multiplication on device (a = x * y): 12.20 ms (tiled approach) [3.287916e+00]

```

References

Owner

Name: puzzlef
Login: puzzlef
Kind: organization

Website: https://puzzlef.github.io/
Repositories: 10
Profile: https://github.com/puzzlef

A summary of experiments.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Sahu
    given-names: Subhajit
    orcid: https://orcid.org/0000-0001-5140-6578
title: "puzzlef/hello-cuda: A basic \"Hello world\" or \"Hello CUDA\" example to perform a number of operations on NVIDIA GPUs using CUDA"
version: 1.0.0
doi: 10.5281/zenodo.10030459
date-released: 2023-10-22

GitHub Events

Total

Push event: 1

Last Year

Push event: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0