Recent Releases of vector-sum

vector-sum - Performance of vector element sum using float vs bfloat16 as the storage type

Performance of vector element sum using float vs bfloat16 as the storage type.

This experiment was for comparing the performance between: 1. Find sum of numbers, stored as float. 2. Find sum of numbers, stored as bfloat16.

Both approaches were attempted on a number of vector sizes, running each approach 5 times per size to get a good time measure. While it seemed to me that bfloat16 method would be a clear winner because of reduced memory bandwidth requirement, for some reason it is only slightly faster. This is possibly because memory loads are anyway always 32-bit. The only reason using bfloat16 is slightly faster could possibly be because it allows data to be retained in cache for a longer period of time (because of its small size). Note that neither approach makes use of SIMD instructions which are available on all modern hardware.

All outputs are saved in out and a small part of the output is listed here. Some charts are also included below, generated from sheets. This experiment was done with guidance from Prof. Dip Sankar Banerjee and Prof. Kishore Kothapalli.


```bash $ g++ -O3 main.cxx $ ./a.out

[00000.050 ms; 1e+04 elems.] [1.644725] sumFloat

[00000.050 ms; 1e+04 elems.] [1.643810] sumBfloat16

[00000.504 ms; 1e+05 elems.] [1.644725] sumFloat

[00000.505 ms; 1e+05 elems.] [1.643810] sumBfloat16

[00001.780 ms; 1e+06 elems.] [1.644725] sumFloat

[00001.342 ms; 1e+06 elems.] [1.643810] sumBfloat16

[00013.539 ms; 1e+07 elems.] [1.644725] sumFloat

[00013.432 ms; 1e+07 elems.] [1.643810] sumBfloat16

[00135.340 ms; 1e+08 elems.] [1.644725] sumFloat

[00134.231 ms; 1e+08 elems.] [1.643810] sumBfloat16

[01354.430 ms; 1e+09 elems.] [1.644725] sumFloat

[01343.146 ms; 1e+09 elems.] [1.643810] sumBfloat16

```



References



- C++
Published by wolfram77 over 3 years ago