vir-simd
improve the usage experience of std::experimental::simd (Parallelism TS 2)
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.1%) to scientific vocabulary
Keywords
Repository
improve the usage experience of std::experimental::simd (Parallelism TS 2)
Basic Info
- Host: GitHub
- Owner: mattkretz
- License: lgpl-3.0
- Language: C++
- Default Branch: master
- Homepage: https://mattkretz.github.io/vir-simd/master/
- Size: 897 KB
Statistics
- Stars: 29
- Watchers: 6
- Forks: 4
- Open Issues: 7
- Releases: 8
Topics
Metadata Files
README.md
vir::stdx::simd
This project aims to provide a fallback std::experimental::simd (Parallelism TS 2)
implementation with additional features. Not every user can rely on GCC 11+
and its standard library to be present on all target systems. Therefore, the
header vir/simd.h provides a fallback implementation of the TS specification
that only implements the scalar and fixed_size<N> ABI tags. Thus, your code
can still compile and run correctly, even if it is missing the performance
gains a proper implementation provides.
Table of Contents
- Installation
- Usage
- Options
- Additional Features
- Simple iota
simdconstants - Making
simdconversions more convenient - Permutations
- SIMD execution policy
- Usable algorithms
- Example
- Execution policy modifiers
- Bitwise operators for floating-point
simd - Conversion between
std::bitsetandsimd_mask - vir::simdresize and vir::simdsize_cast
- vir::simdbitcast
- Concepts
- simdize type transformation
- Benchmark support functions
constexpr_wrapper: function arguments as constant expressions- Example
- Testing for the version of the vir::stdx::simd (vir-simd) library
- Semantics of version numbers
- Debugging
- Simple iota
Installation
This is a header-only library. Installation is a simple copy of the headers to
wherever you want them. Per default make install copies the headers into
/usr/local/include/vir/.
Examples: ```sh
installs to $HOME/.local/include/vir
make install prefix=~/.local
installs to $HOME/src/myproject/3rdparty/vir
make install includedir=~/src/myproject/3rdparty ```
Usage
```c++
include
namespace stdx = vir::stdx;
using floatv = stdx::native_simd
The vir/simd.h header will include <experimental/simd> if it is available,
so you don't have to add any buildsystem support. It should just work.
Options
VIR_SIMD_TS_DROPIN: Define the macroVIR_SIMD_TS_DROPINbefore including<vir/simd.h>to define everything in the namespace specified in the Parallelism TS 2 (namelystd::experimental::parallelism_v2).VIR_DISABLE_STDX_SIMD: Do not include<experimental/simd>even if it is available. This allows compiling your code with the<vir/simd.h>implementation unconditionally. This is useful for testing.
Additional Features
The TS curiously forgot to add simd_cast and static_simd_cast overloads for
simd_mask. With vir::stdx::(static_)simd_cast, casts will also work for
simd_mask. This does not require any additional includes.
Simple iota simd constants
Requires Concepts (C++20).
```c++
include
constexpr auto a = vir::iota_vstdx::simd<float> * 3; // 0, 3, 6, 9, ... ```
The variable template vir::iota_v<T> can be instantiated with arithmetic
types, array types (std::array and C-arrays), and simd types. In all cases,
the elements of the variable will be initialized to 0, 1, 2, 3, 4, ...,
depending on the number of elements in T. For arithmetic types
vir::iota_v<T> is always just 0.
Making simd conversions more convenient
Requires Concepts (C++20).
The TS is way too strict about conversions, requiring verbose
std::experimental::static_simd_cast<T>(x) instead of a concise T(x) or
static_cast<T>(x). (std::simd in C++26 will fix this.)
vir::cvt(x) provides a tool to make x implicitly convertible into whatever
the expression wants in order to be well-formed. This only works, if there is
an unambiguous type that is required.
```c++
include
using floatv = stdx::nativesimd
void f(intv x) { using vir::cvt; // the floatv constructor and intv assignment operator clearly determine the // destination type: x = cvt(10 * sin(floatv(cvt(x))));
// without vir::cvt, one would have write:
x = stdx::staticsimdcast
// probably don't do this too often:
auto y = cvt(x); // y is a const-ref to x, but so much more convertible
// y is of type cvt
Note that vir::cvt also works for simd_mask and non-simd types. Thus,
cvt becomes an important building block for writing "simd-generic" code
(i.e. well-formed for T and simd<T>).
Permutations (paper)
Requires Concepts (C++20).
```c++
include
// v = {0, 1, 2, 3} -> {1, 0, 3, 2} vir::simdpermute(v, vir::simdpermutations::swap_neighbors);
// v = {1, 2, 3, 4} -> {2, 2, 2, 2} vir::simd_permute(v, { return 1; });
// v = {1, 2, 3, 4} -> {3, 3, 3, 3} vir::simd_permute(v, { return -2; }); ```
The following permutations are pre-defined:
vir::simd_permutations::duplicate_even: copy values at even indices to neighboring odd positionvir::simd_permutations::duplicate_odd: copy values at odd indices to neighboring even positionvir::simd_permutations::swap_neighbors<N>: swapNconsecutive values with the followingNconsecutive valuesvir::simd_permutations::broadcast<Idx>: copy the value at indexIdxto all other valuesvir::simd_permutations::broadcast_first: alias forbroadcast<0>vir::simd_permutations::broadcast_last: alias forbroadcast<-1>vir::simd_permutations::reverse: reverse the order of all valuesvir::simd_permutations::rotate<Offset>: positiveOffsetrotates values to the left, negativeOffsetrotates values to the right (i.e.rotate<Offset>moves values from index(i + Offset) % sizetoi)vir::simd_permutations::shift<Offset>: positiveOffsetshifts values to the left, negativeOffsetshifts values to the right; shifting in zeros.
A vir::simd_permute(x, idx_perm) overload, where x is of vectorizable
type, is also included, facilitating generic code.
A special permutation vir::simd_shift_in<N>(x, ...) shifts by N elements
shifting in elements from additional simd objects passed via the pack.
Example:
c++
// v = {1, 2, 3, 4}, w = {5, 6, 7, 8} -> {2, 3, 4, 5}
vir::simd_shift_in<1>(v, w);
SIMD execution policy (P0350)
Requires Concepts (C++20).
Adds an execution policy vir::execution::simd. The execution policy can be
used with the algorithms implemented in the vir namespace. These algorithms
are additionally overloaded in the std namespace.
At this point, the implementation of the execution policy requires contiguous ranges / iterators.
Usable algorithms
std::for_each/vir::for_eachstd::count_if/vir::count_ifstd::transform/vir::transformstd::transform_reduce/vir::transform_reducestd::reduce/vir::reduce
Example
```c++
include
void incrementall(std::vector
// or
void incrementall(std::vector
Execution policy modifiers
The vir::execution::simd execution policy supports a few settings modifying
its behavior:
vir::execution::simd.prefer_size<N>(): Start with chunking the range into parts ofNelements, calling the user-supplied function(s) with objects of typeresize_simd_t<N, simd<T>>.vir::execution::simd.unroll_by<M>(): Iterate over the range in chunks ofsimd::size() * Minstead of justsimd::size(). The algorithm will executeMloads (or stores) together before/after calling the user-supplied function(s). The user-supplied function may be called withMsimdobjects instead of onesimdobject. Note that prologue and epilogue will typically still call the user-supplied function with a singlesimdobject. Algorithms likestd::count_ifrequire a return value from the user-supplied function and therefore still call the function with a singlesimd(to avoid the need for returning anarrayortupleofsimd_mask). Such algorithms will still make use of unrolling inside their implementation.vir::execution::simd.assume_matching_size(): Add a precondition to the algorithm, that the given range size is a multiple of the SIMD width (but not the SIMD width multiplied by the above unroll factor). This modifier is only valid without prologue (the following two modifiers). The algorithm consequently does not implement an epilogue and all given callables are called with a single simd type (same width and ABI tag). This can reduce code size significantly.vir::execution::simd.prefer_aligned(): Unconditionally iterate using smaller chunks, until the main iteration can load (and store) chunks from/to aligned addresses. This can be more efficient if the range is large, avoiding cache-line splits. (e.g. with AVX-512, unaligned iteration leads to cache-line splits on every iteration; with AVX on every second iteration)vir::execution::simd.auto_prologue()(still testing its viability, may be removed): Determine from run-time information (i.e. add a branch) whether a prologue for alignment of the main chunked iteration might be more efficient.
Bitwise operators for floating-point simd
```c++
include
using namespace vir::simdfloatops;
``
Then the&,|, and^binary operators can be used with objects of type
simd<floating-point, A>`.
Conversion between std::bitset and simd_mask
```c++
include
vir::stdx::simdmask
There are two overloads of vir::to_simd_mask:
c++
to_simd_mask<T, A>(bitset<simd_size_v<T, A>>)
and
c++
to_simd_mask<T, N>(bitset<N>)
vir::simdresize and vir::simdsize_cast
The header ```c++
include
``` declares the functions
vir::simd_resize<N>(simd),vir::simd_resize<N>(simd_mask),vir::simd_size_cast<V>(simd), andvir::simd_size_cast<M>(simd_mask).
These functions can resize a given simd or simd_mask object. If the return
type requires more elements than the input parameter, the new elements are
default-initialized and appended at the end. Both functions do not allow a
change of the value_type. However, implicit conversions can happen on
parameter passing to simd_size_cast.
vir::simdbitcast
The header ```c++
include
``
declares the functionvir::simdbitcast. This function serves the
same purpose asstd::bit_castbut additionally works in cases where asimd`
type is not trivially copyable.
Concepts
Requires Concepts (C++20).
The header ```c++
include
``` defines the following concepts:
vir::arithmetic<T>: Whatstd::arithmetic<T>should be: satisfied ifTis an arithmetic type (as specified by the C++ core language).vir::vectorizable<T>: Satisfied ifTis a valid element type forstdx::simdandstdx::simd_mask.vir::simd_abi_tag<T>: Satisfied ifTis a valid ABI tag forstdx::simdandstdx::simd_mask.vir::any_simd<V>: Satisfied ifVis a specialization ofstdx::simd<T, Abi>and the typesTandAbisatisfyvir::vectorizable<T>andvir::simd_abi_tag<Abi>.vir::any_simd_mask<V>: Analogue tovir::any_simd<V>forstdx::simd_maskinstead ofstdx::simd.vir::typed_simd<V, T>: Satisfied ifvir::any_simd<V>andTis the element type ofV.vir::sized_simd<V, Width>: Satisfied ifvir::any_simd<V>andWidthis the width ofV.vir::sized_simd_mask<V, Width>: Analogue tovir::sized_simd<V, Width>forstdx::simd_maskinstead ofstdx::simd.
simdize type transformation
Requires Concepts (C++20).
:warning: consider this interface under :construction:
The header ```c++
include
``` defines the following types and constants:
vir::simdize<T, N>:Nis optional. Type alias for asimdorvir::simd_tupletype determined from the typeT.- If
vir::vectorizable<T>is satisfied, thenstdx::simd<T, Abi>is produced.Abiis determined fromNand will besimd_abi::native<T>ifNwas omitted. - If
Tis astd::tupleor aggregate that can be reflected, then a specialization ofvir::simd_tupleis produced. IfTis a template specialization (without NTTPs), the metafunction tries vectorization via applyingsimdizeto all template arguments. If this doesn't yield the same data structure layout as member-only vectorization, then the type behaves similar to astd::tuplewith additional API to make the type similar tostdx::simd(see below). This specialization will be derived fromstd::tupleand the tuple elements will either bevir::simd_tupleorstdx::simdtypes.vir::simdizeis applied recursively to thestd::tuple/aggregate data members. - Otherwise,
Tcannot be simdized (e.g. void, no data members,std::tuple<>) then no transformation is applied andsimdize<T>is an alias forT. - If
Nwas omitted, the resulting width of allsimdtypes in the resulting type will match the largestnative_simdwidth.
- If
Example: vir::simdize<std::tuple<double, short>> produces a tuple with the
element types stdx::rebind_simd_t<double, stdx::native_simd<short>> and
stdx::native_simd<short>.
vir::simd_tuple<reflectable_struct T, size_t N>: Don't use this class template directly. Letvir::simdizeinstantiate specializations of this class template.vir::simd_tuplemostly behaves like astd::tupleand adds the following interface on top ofstd::tuple:value_typemask_typesize- tuple-like constructors
- broadcast and/or conversion constructors
- load constructor
as_tuple(): Returns the data members as astd::tuple.operator[](size_t): Copy of a singleTstored in thesimd_tuple. This is not a cheap operation because there are noTobjects stored in thesimd_tuple.copy_from(std::contiguous_iterator): :construction: unoptimized load from a contiguous array of struct (e.g.std::vector<T>).copy_to(std::contiguous_iterator): :construction: unoptimized store to a contiguous array of struct.
vir::simd_tuple<vectorizable_struct_template T, size_t N>: TODOvir::get<I>(simd_tuple): Access to theI-th data member (asimd).vir::simdize_size<T>,vir::simdize_size_v<T>
Benchmark support functions
Requires Concepts (C++20) and GNU compatible inline-asm.
The header ```c++
include
``` defines the following functions:
vir::fake_modify(...): Let the compiler assume that all arguments passed to this functions are modified. This inhibits constant propagation, hoisting of code sections, and dead-code elimination.vir::fake_read(...): Let the compiler assume that all arguments passed to this function are read (in the cheapest manner). This inhibits dead-code elimination leading up to the results passed to this function.
constexpr_wrapper: function arguments as constant expressions
The header ```c++
include
``` defines the following tools:
vir::constexpr_value(concept): Satisfied by any type with a static::valuemember that can be used in a constant expression.vir::constexpr_wrapper<auto>(class template): A type storing the value of its NTTP (non-type template parameter) and overloading all operators to return anotherconstexpr_wrapper.constexpr_wrapperobjects are implicitly convertible to their value type (aconstexpr_wrapperautomatically unwraps its constant expression).vir::cw<auto>(variable template): Shorthand for producingconstexpr_wrapperobjects with the given value.vir::literals(namespace with_cwUDL): Shorthand for producingconstexpr_wrapperobjects of the integer literal in front of the_cwsuffix. The type will be deduced automatically from the value of the literal to be the smallest signed integral type, or if the value is larger,unsigned long long. If the value is too large for anunsigned long long, the program is ill-formed.
constexpr_wrapper may appear unrelated to simd. However, it is an important
tool used in many places in the implementation and on interfaces of vir-simd
tools. vir::constexpr_wrapper is very similar to std::integral_constant,
which is used in the simd TS interface for generator constructors.
Example
```c++
include
auto f(vir::constexpr_value auto N)
{
std::array
std::array a = f(vir::cw<4>); // array
using namespace vir::literals;
std::array b = f(10_cw); // array
This example cannot work with a signature constexpr auto f(int n) (or
consteval) because n will never be considered a constant expression in the
body of the function.
Testing for the version of the vir::stdx::simd (vir-simd) library
The header ```c++
include
(which is also included from `<vir/simd.h>`) defines the type and constant
c++
namespace vir
{
struct simdversiont { int major, minor, patchlevel; };
constexpr simdversiont simdversion;
}
``
in addition to the macrosVIRSIMDVERSION,VIRSIMDVERSIONMAJOR,
VIRSIMDVERSIONMINOR, andVIRSIMDVERSIONPATCHLEVEL`.
simd_version_t implements all comparison operators, allowing e.g.
c++
static_assert(vir::simd_version >= vir::simd_version_t{0,4,0});
Semantics of version numbers
An increment of the major version number implies a breaking change.
An increment of the minor version number implies new features without breaking changes.
An increment of the patchlevel is used for bug fixes.
Odd patchlevel numbers indicate a development (not released) version.
Debugging
Compile with -D _GLIBCXX_DEBUG_UB to get runtime checks for undefined
behavior in the simd implementation(s). Otherwise, -fsanitize=undefined
without the macro definition will also find the problems, but without
additional error message.
Preconditions in the vir::stdx::simd implementation and extensions are
controlled via the -D VIR_CHECK_PRECONDITIONS=N macro, which defaults to 3.
Compile-time diagnostics are only possible if the compiler's optimizer can
detect the precondition failure. If you get a bogus compile-time failure, you
need to introduce the necessary assumption into your calling function, which is
typically a missing precondition check in your function.
| Option | at compile-time | at run-time |
|:--------------------------|:-------------------:|:---------------:|
| -DVIR_CHECK_PRECONDITIONS=0 | warning | invoke UB/unreachable |
| -DVIR_CHECK_PRECONDITIONS=1 | error | invoke UB/unreachable |
| -DVIR_CHECK_PRECONDITIONS=2 | warning | trap |
| -DVIR_CHECK_PRECONDITIONS=3 | error | trap |
| -DVIR_CHECK_PRECONDITIONS=4 | warning | print error and abort |
| -DVIR_CHECK_PRECONDITIONS=5 | error | print error and abort |
Owner
- Name: Matthias Kretz
- Login: mattkretz
- Kind: user
- Location: Darmstadt, Germany
- Company: GSI Helmholtzzentrum für Schwerionenforschung
- Website: https://mattkretz.github.io/
- Repositories: 32
- Profile: https://github.com/mattkretz
C++ Committee Numerics Chair, SIMD specialist, CS PhD, Dipl.-Phys, High Energy Physics Software, former KDE core developer, ORCID: 0000-0002-0867-243X
GitHub Events
Total
- Create event: 9
- Release event: 3
- Issues event: 3
- Watch event: 6
- Delete event: 8
- Issue comment event: 8
- Push event: 34
- Pull request event: 8
- Fork event: 2
Last Year
- Create event: 9
- Release event: 3
- Issues event: 3
- Watch event: 6
- Delete event: 8
- Issue comment event: 8
- Push event: 34
- Pull request event: 8
- Fork event: 2
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 6
- Total pull requests: 33
- Average time to close issues: 7 months
- Average time to close pull requests: 28 days
- Total issue authors: 4
- Total pull request authors: 2
- Average comments per issue: 3.0
- Average comments per pull request: 0.24
- Merged pull requests: 29
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 15
- Average time to close issues: N/A
- Average time to close pull requests: 16 days
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.13
- Merged pull requests: 13
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- fuhlig1 (3)
- ax3l (1)
- AlvaroFS (1)
- bernhardmgruber (1)
- mattkretz (1)
Pull Request Authors
- mattkretz (44)
- AlvaroFS (1)