vir-simd

improve the usage experience of std::experimental::simd (Parallelism TS 2)

https://github.com/mattkretz/vir-simd

Keywords

cpp cpp17-library parallelism-ts simd simd-library

Last synced: 6 months ago · JSON representation

Repository

improve the usage experience of std::experimental::simd (Parallelism TS 2)

Basic Info

Host: GitHub
Owner: mattkretz
License: lgpl-3.0
Language: C++
Default Branch: master
Homepage: https://mattkretz.github.io/vir-simd/master/
Size: 897 KB

Statistics

Stars: 29
Watchers: 6
Forks: 4
Open Issues: 7
Releases: 8

Topics

cpp cpp17-library parallelism-ts simd simd-library

Created over 3 years ago · Last pushed 6 months ago

Metadata Files

Readme Contributing License Code of conduct Citation

vir::stdx::simd

This project aims to provide a fallback std::experimental::simd (Parallelism TS 2) implementation with additional features. Not every user can rely on GCC 11+ and its standard library to be present on all target systems. Therefore, the header vir/simd.h provides a fallback implementation of the TS specification that only implements the scalar and fixed_size<N> ABI tags. Thus, your code can still compile and run correctly, even if it is missing the performance gains a proper implementation provides.

Installation

This is a header-only library. Installation is a simple copy of the headers to wherever you want them. Per default make install copies the headers into /usr/local/include/vir/.

Examples: ```sh

installs to $HOME/.local/include/vir

make install prefix=~/.local

installs to $HOME/src/myproject/3rdparty/vir

make install includedir=~/src/myproject/3rdparty ```

Usage

```c++

include

namespace stdx = vir::stdx;

using floatv = stdx::native_simd; // ... ```

The vir/simd.h header will include <experimental/simd> if it is available, so you don't have to add any buildsystem support. It should just work.

Options

VIR_SIMD_TS_DROPIN: Define the macro VIR_SIMD_TS_DROPIN before including <vir/simd.h> to define everything in the namespace specified in the Parallelism TS 2 (namely std::experimental::parallelism_v2).
VIR_DISABLE_STDX_SIMD: Do not include <experimental/simd> even if it is available. This allows compiling your code with the <vir/simd.h> implementation unconditionally. This is useful for testing.

Additional Features

The TS curiously forgot to add simd_cast and static_simd_cast overloads for simd_mask. With vir::stdx::(static_)simd_cast, casts will also work for simd_mask. This does not require any additional includes.

Simple iota `simd` constants

Requires Concepts (C++20).

```c++

include

constexpr auto a = vir::iota_vstdx::simd<float> * 3; // 0, 3, 6, 9, ... ```

The variable template vir::iota_v<T> can be instantiated with arithmetic types, array types (std::array and C-arrays), and simd types. In all cases, the elements of the variable will be initialized to 0, 1, 2, 3, 4, ..., depending on the number of elements in T. For arithmetic types vir::iota_v<T> is always just 0.

Making `simd` conversions more convenient

Requires Concepts (C++20).

The TS is way too strict about conversions, requiring verbose std::experimental::static_simd_cast<T>(x) instead of a concise T(x) or static_cast<T>(x). (std::simd in C++26 will fix this.)

vir::cvt(x) provides a tool to make x implicitly convertible into whatever the expression wants in order to be well-formed. This only works, if there is an unambiguous type that is required.

```c++

include

using floatv = stdx::nativesimd; using intv = stdx::rebindsimd_t;

void f(intv x) { using vir::cvt; // the floatv constructor and intv assignment operator clearly determine the // destination type: x = cvt(10 * sin(floatv(cvt(x))));

// without vir::cvt, one would have write: x = stdx::staticsimdcast(10 * sin(stdx::staticsimdcast(x)));

// probably don't do this too often: auto y = cvt(x); // y is a const-ref to x, but so much more convertible // y is of type cvt } ```

Note that vir::cvt also works for simd_mask and non-simd types. Thus, cvt becomes an important building block for writing "simd-generic" code (i.e. well-formed for T and simd<T>).

Permutations (paper)

Requires Concepts (C++20).

```c++

include

// v = {0, 1, 2, 3} -> {1, 0, 3, 2} vir::simdpermute(v, vir::simdpermutations::swap_neighbors);

// v = {1, 2, 3, 4} -> {2, 2, 2, 2} vir::simd_permute(v, { return 1; });

// v = {1, 2, 3, 4} -> {3, 3, 3, 3} vir::simd_permute(v, { return -2; }); ```

The following permutations are pre-defined:

vir::simd_permutations::duplicate_even: copy values at even indices to neighboring odd position
vir::simd_permutations::duplicate_odd: copy values at odd indices to neighboring even position
vir::simd_permutations::swap_neighbors<N>: swap N consecutive values with the following N consecutive values
vir::simd_permutations::broadcast<Idx>: copy the value at index Idx to all other values
vir::simd_permutations::broadcast_first: alias for broadcast<0>
vir::simd_permutations::broadcast_last: alias for broadcast<-1>
vir::simd_permutations::reverse: reverse the order of all values
vir::simd_permutations::rotate<Offset>: positive Offset rotates values to the left, negative Offset rotates values to the right (i.e. rotate<Offset> moves values from index (i + Offset) % size to i)
vir::simd_permutations::shift<Offset>: positive Offset shifts values to the left, negative Offset shifts values to the right; shifting in zeros.

A vir::simd_permute(x, idx_perm) overload, where x is of vectorizable type, is also included, facilitating generic code.

A special permutation vir::simd_shift_in<N>(x, ...) shifts by N elements shifting in elements from additional simd objects passed via the pack. Example: c++ // v = {1, 2, 3, 4}, w = {5, 6, 7, 8} -> {2, 3, 4, 5} vir::simd_shift_in<1>(v, w);

SIMD execution policy (P0350)

Requires Concepts (C++20).

Adds an execution policy vir::execution::simd. The execution policy can be used with the algorithms implemented in the vir namespace. These algorithms are additionally overloaded in the std namespace.

At this point, the implementation of the execution policy requires contiguous ranges / iterators.

Usable algorithms

std::for_each / vir::for_each
std::count_if / vir::count_if
std::transform / vir::transform
std::transform_reduce / vir::transform_reduce
std::reduce / vir::reduce

Example

```c++

include

void incrementall(std::vector data) { std::foreach(vir::execution::simd, data.begin(), data.end(), { v += 1.f; }); }

// or

void incrementall(std::vector data) { vir::foreach(vir::execution::simd, data, { v += 1.f; }); } ```

Execution policy modifiers

The vir::execution::simd execution policy supports a few settings modifying its behavior:

vir::execution::simd.prefer_size<N>(): Start with chunking the range into parts of N elements, calling the user-supplied function(s) with objects of type resize_simd_t<N, simd<T>>.
vir::execution::simd.unroll_by<M>(): Iterate over the range in chunks of simd::size() * M instead of just simd::size(). The algorithm will execute M loads (or stores) together before/after calling the user-supplied function(s). The user-supplied function may be called with M simd objects instead of one simd object. Note that prologue and epilogue will typically still call the user-supplied function with a single simd object. Algorithms like std::count_if require a return value from the user-supplied function and therefore still call the function with a single simd (to avoid the need for returning an array or tuple of simd_mask). Such algorithms will still make use of unrolling inside their implementation.
vir::execution::simd.assume_matching_size(): Add a precondition to the algorithm, that the given range size is a multiple of the SIMD width (but not the SIMD width multiplied by the above unroll factor). This modifier is only valid without prologue (the following two modifiers). The algorithm consequently does not implement an epilogue and all given callables are called with a single simd type (same width and ABI tag). This can reduce code size significantly.
vir::execution::simd.prefer_aligned(): Unconditionally iterate using smaller chunks, until the main iteration can load (and store) chunks from/to aligned addresses. This can be more efficient if the range is large, avoiding cache-line splits. (e.g. with AVX-512, unaligned iteration leads to cache-line splits on every iteration; with AVX on every second iteration)
vir::execution::simd.auto_prologue() (still testing its viability, may be removed): Determine from run-time information (i.e. add a branch) whether a prologue for alignment of the main chunked iteration might be more efficient.

Bitwise operators for floating-point `simd`

```c++

include

using namespace vir::simdfloatops; ``Then the&,|, and^binary operators can be used with objects of typesimd<floating-point, A>`.

Conversion between `std::bitset` and `simd_mask`

```c++

include

vir::stdx::simdmask k; std::bitset b = vir::tobitset(k); vir::stdx::simdmask k2 = vir::tosimd_mask; ```

There are two overloads of vir::to_simd_mask: c++ to_simd_mask<T, A>(bitset<simd_size_v<T, A>>) and c++ to_simd_mask<T, N>(bitset<N>)

vir::simdresize and vir::simdsize_cast

The header ```c++

include

``` declares the functions

vir::simd_resize<N>(simd),
vir::simd_resize<N>(simd_mask),
vir::simd_size_cast<V>(simd), and
vir::simd_size_cast<M>(simd_mask).

These functions can resize a given simd or simd_mask object. If the return type requires more elements than the input parameter, the new elements are default-initialized and appended at the end. Both functions do not allow a change of the value_type. However, implicit conversions can happen on parameter passing to simd_size_cast.

vir::simdbitcast

The header ```c++

include

``declares the functionvir::simdbitcast(from). This function serves the same purpose asstd::bit_castbut additionally works in cases where asimd` type is not trivially copyable.

Concepts

Requires Concepts (C++20).

The header ```c++

include

``` defines the following concepts:

vir::arithmetic<T>: What std::arithmetic<T> should be: satisfied if T is an arithmetic type (as specified by the C++ core language).
vir::vectorizable<T>: Satisfied if T is a valid element type for stdx::simd and stdx::simd_mask.
vir::simd_abi_tag<T>: Satisfied if T is a valid ABI tag for stdx::simd and stdx::simd_mask.
vir::any_simd<V>: Satisfied if V is a specialization of stdx::simd<T, Abi> and the types T and Abi satisfy vir::vectorizable<T> and vir::simd_abi_tag<Abi>.
vir::any_simd_mask<V>: Analogue to vir::any_simd<V> for stdx::simd_mask instead of stdx::simd.
vir::typed_simd<V, T>: Satisfied if vir::any_simd<V> and T is the element type of V.
vir::sized_simd<V, Width>: Satisfied if vir::any_simd<V> and Width is the width of V.
vir::sized_simd_mask<V, Width>: Analogue to vir::sized_simd<V, Width> for stdx::simd_mask instead of stdx::simd.

simdize type transformation

Requires Concepts (C++20).

:warning: consider this interface under :construction:

The header ```c++

include

``` defines the following types and constants:

vir::simdize<T, N>: N is optional. Type alias for a simd or vir::simd_tuple type determined from the type T.
- If vir::vectorizable<T> is satisfied, then stdx::simd<T, Abi> is produced. Abi is determined from N and will be simd_abi::native<T> if N was omitted.
- If T is a std::tuple or aggregate that can be reflected, then a specialization of vir::simd_tuple is produced. If T is a template specialization (without NTTPs), the metafunction tries vectorization via applying simdize to all template arguments. If this doesn't yield the same data structure layout as member-only vectorization, then the type behaves similar to a std::tuple with additional API to make the type similar to stdx::simd (see below). This specialization will be derived from std::tuple and the tuple elements will either be vir::simd_tuple or stdx::simd types. vir::simdize is applied recursively to the std::tuple/aggregate data members.
- Otherwise, T cannot be simdized (e.g. void, no data members, std::tuple<>) then no transformation is applied and simdize<T> is an alias for T.
- If N was omitted, the resulting width of all simd types in the resulting type will match the largest native_simd width.

Example: vir::simdize<std::tuple<double, short>> produces a tuple with the element types stdx::rebind_simd_t<double, stdx::native_simd<short>> and stdx::native_simd<short>.

vir::simd_tuple<reflectable_struct T, size_t N>: Don't use this class template directly. Let vir::simdize instantiate specializations of this class template. vir::simd_tuple mostly behaves like a std::tuple and adds the following interface on top of std::tuple:
- value_type
- mask_type
- size
- tuple-like constructors
- broadcast and/or conversion constructors
- load constructor
- as_tuple(): Returns the data members as a std::tuple.
- operator[](size_t): Copy of a single T stored in the simd_tuple. This is not a cheap operation because there are no T objects stored in the simd_tuple.
- copy_from(std::contiguous_iterator): :construction: unoptimized load from a contiguous array of struct (e.g. std::vector<T>).
- copy_to(std::contiguous_iterator): :construction: unoptimized store to a contiguous array of struct.
vir::simd_tuple<vectorizable_struct_template T, size_t N>: TODO
vir::get<I>(simd_tuple): Access to the I-th data member (a simd).
vir::simdize_size<T>, vir::simdize_size_v<T>

Benchmark support functions

Requires Concepts (C++20) and GNU compatible inline-asm.

The header ```c++

include

``` defines the following functions:

vir::fake_modify(...): Let the compiler assume that all arguments passed to this functions are modified. This inhibits constant propagation, hoisting of code sections, and dead-code elimination.
vir::fake_read(...): Let the compiler assume that all arguments passed to this function are read (in the cheapest manner). This inhibits dead-code elimination leading up to the results passed to this function.

`constexpr_wrapper`: function arguments as constant expressions

The header ```c++

include

``` defines the following tools:

vir::constexpr_value (concept): Satisfied by any type with a static ::value member that can be used in a constant expression.
vir::constexpr_wrapper<auto> (class template): A type storing the value of its NTTP (non-type template parameter) and overloading all operators to return another constexpr_wrapper. constexpr_wrapper objects are implicitly convertible to their value type (a constexpr_wrapper automatically unwraps its constant expression).
vir::cw<auto> (variable template): Shorthand for producing constexpr_wrapper objects with the given value.
vir::literals (namespace with _cw UDL): Shorthand for producing constexpr_wrapper objects of the integer literal in front of the _cw suffix. The type will be deduced automatically from the value of the literal to be the smallest signed integral type, or if the value is larger, unsigned long long. If the value is too large for an unsigned long long, the program is ill-formed.

constexpr_wrapper may appear unrelated to simd. However, it is an important tool used in many places in the implementation and on interfaces of vir-simd tools. vir::constexpr_wrapper is very similar to std::integral_constant, which is used in the simd TS interface for generator constructors.

Example

```c++

include

auto f(vir::constexpr_value auto N) { std::array x = {}; return x; }

std::array a = f(vir::cw<4>); // array

using namespace vir::literals;

std::array b = f(10_cw); // array ```

This example cannot work with a signature constexpr auto f(int n) (or consteval) because n will never be considered a constant expression in the body of the function.

Testing for the version of the vir::stdx::simd (vir-simd) library

The header ```c++

include

(which is also included from `<vir/simd.h>`) defines the type and constantc++ namespace vir { struct simdversiont { int major, minor, patchlevel; };

constexpr simdversiont simdversion; } ``in addition to the macrosVIRSIMDVERSION,VIRSIMDVERSIONMAJOR,VIRSIMDVERSIONMINOR, andVIRSIMDVERSIONPATCHLEVEL`.

simd_version_t implements all comparison operators, allowing e.g. c++ static_assert(vir::simd_version >= vir::simd_version_t{0,4,0});

Semantics of version numbers

An increment of the major version number implies a breaking change.
An increment of the minor version number implies new features without breaking changes.
An increment of the patchlevel is used for bug fixes.
Odd patchlevel numbers indicate a development (not released) version.

Debugging

Compile with -D _GLIBCXX_DEBUG_UB to get runtime checks for undefined behavior in the simd implementation(s). Otherwise, -fsanitize=undefined without the macro definition will also find the problems, but without additional error message.

Preconditions in the vir::stdx::simd implementation and extensions are controlled via the -D VIR_CHECK_PRECONDITIONS=N macro, which defaults to 3. Compile-time diagnostics are only possible if the compiler's optimizer can detect the precondition failure. If you get a bogus compile-time failure, you need to introduce the necessary assumption into your calling function, which is typically a missing precondition check in your function.

| Option | at compile-time | at run-time | |:--------------------------|:-------------------:|:---------------:| | -DVIR_CHECK_PRECONDITIONS=0 | warning | invoke UB/unreachable | | -DVIR_CHECK_PRECONDITIONS=1 | error | invoke UB/unreachable | | -DVIR_CHECK_PRECONDITIONS=2 | warning | trap | | -DVIR_CHECK_PRECONDITIONS=3 | error | trap | | -DVIR_CHECK_PRECONDITIONS=4 | warning | print error and abort | | -DVIR_CHECK_PRECONDITIONS=5 | error | print error and abort |

Owner

Name: Matthias Kretz
Login: mattkretz
Kind: user
Location: Darmstadt, Germany
Company: GSI Helmholtzzentrum für Schwerionenforschung

Website: https://mattkretz.github.io/
Repositories: 32
Profile: https://github.com/mattkretz

C++ Committee Numerics Chair, SIMD specialist, CS PhD, Dipl.-Phys, High Energy Physics Software, former KDE core developer, ORCID: 0000-0002-0867-243X

GitHub Events

Total

Create event: 9
Release event: 3
Issues event: 3
Watch event: 6
Delete event: 8
Issue comment event: 8
Push event: 34
Pull request event: 8
Fork event: 2

Last Year

Create event: 9
Release event: 3
Issues event: 3
Watch event: 6
Delete event: 8
Issue comment event: 8
Push event: 34
Pull request event: 8
Fork event: 2

Issues and Pull Requests

Last synced: 8 months ago

All Time

Total issues: 6
Total pull requests: 33
Average time to close issues: 7 months
Average time to close pull requests: 28 days
Total issue authors: 4
Total pull request authors: 2
Average comments per issue: 3.0
Average comments per pull request: 0.24
Merged pull requests: 29
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 15
Average time to close issues: N/A
Average time to close pull requests: 16 days
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.13
Merged pull requests: 13
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

fuhlig1 (3)
ax3l (1)
AlvaroFS (1)
bernhardmgruber (1)
mattkretz (1)

Pull Request Authors

mattkretz (44)
AlvaroFS (1)

Top Labels

Issue Labels

question (1) enhancement (1) bug (1)

Pull Request Labels

enhancement (13) bug (10) documentation (3) optimization (2)

vir-simd

Science Score: 49.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

vir::stdx::simd

Table of Contents

Installation

installs to $HOME/.local/include/vir

installs to $HOME/src/myproject/3rdparty/vir

Usage

include

Options

Additional Features

Simple iota simd constants

include

Making simd conversions more convenient

include

Permutations (paper)

include

SIMD execution policy (P0350)

Usable algorithms

Example

include

Execution policy modifiers

Bitwise operators for floating-point simd

include

Conversion between std::bitset and simd_mask

include

vir::simdresize and vir::simdsize_cast

include

vir::simdbitcast

include

Concepts

include

simdize type transformation

include

Benchmark support functions

include

constexpr_wrapper: function arguments as constant expressions

include

Example

include

Testing for the version of the vir::stdx::simd (vir-simd) library

include

Semantics of version numbers

Debugging

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Simple iota `simd` constants

Making `simd` conversions more convenient

Bitwise operators for floating-point `simd`

Conversion between `std::bitset` and `simd_mask`

`constexpr_wrapper`: function arguments as constant expressions