Recent Releases of vector-sum-cuda

vector-sum-cuda - Performance of sequential vs CUDA-based vector element sum

Performance of sequential vs CUDA-based vector element sum.

This experiment was for comparing the performance between: 1. Find sum(x) using a single thread (sequential). 2. Find sum(x) accelerated using CUDA (not power-of-2 reduce). 3. Find sum(x) accelerated using CUDA (power-of-2 reduce).

Here x is a 32-bit integer vector. Both approaches were attempted on a number of vector sizes, running each approach 5 times per size to get a good time measure. Note that time taken to copy data back and forth from the GPU is not measured, and the sequential approach does not make use of SIMD instructions. While it might seem that CUDA approach would be a clear winner, the results indicate it is dependent upon the workload. Results indicate that from 10^5 elements, CUDA approach performs better than sequential. Both CUDA approaches (not power-of-2/power-of-2 reduce) seem to have similar performance.

All outputs are saved in a gist and a small part of the output is listed here. Some charts are also included below, generated from sheets. This experiment was done with guidance from Prof. Kishore Kothapalli and Prof. Dip Sankar Banerjee.


```bash $ nvcc -std=c++17 -Xcompiler -O3 main.cu $ ./a.out

[00000.002 ms; 1e+03 elems.] [502942114] sumSeq

[00001.128 ms; 1e+03 elems.] [502942114] sumCuda

[00000.018 ms; 1e+03 elems.] [502942114] sumCudaPow2

...

```



References




ORG

- C++
Published by wolfram77 over 3 years ago