kernel_float

CUDA/HIP header-only library for low-precision (16 bit, 8 bit) and vectorized GPU kernel development

https://github.com/kerneltuner/kernel_float

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary

Keywords

bfloat16 cpp cuda floating-point gpu half-precision header-only-library hip kernel-tuner low-precision mixed-precision performance reduced-precision vectorization

Last synced: 6 months ago · JSON representation

Repository

CUDA/HIP header-only library for low-precision (16 bit, 8 bit) and vectorized GPU kernel development

Basic Info

Host: GitHub
Owner: KernelTuner
License: apache-2.0
Language: C++
Default Branch: main
Homepage: https://kerneltuner.github.io/kernel_float/
Size: 6.86 MB

Statistics

Stars: 11
Watchers: 4
Forks: 2
Open Issues: 0
Releases: 3

Topics

bfloat16 cpp cuda floating-point gpu half-precision header-only-library hip kernel-tuner low-precision mixed-precision performance reduced-precision vectorization

Created almost 3 years ago · Last pushed 6 months ago

Metadata Files

Readme License Zenodo

Kernel Float

GitHub branch checks state GitHub GitHub tag (latest by date)

Kernel Float is a header-only library for CUDA/HIP that simplifies working with vector types and reduced precision floating-point arithmetic in GPU code.

Summary

CUDA/HIP natively offers several reduced precision floating-point types (__half, __nv_bfloat16, __nv_fp8_e4m3, __nv_fp8_e5m2) and vector types (e.g., __half2, __nv_fp8x4_e4m3, float3). However, working with these types is cumbersome: mathematical operations require intrinsics (e.g., __hadd2 performs addition for __half2), type conversion is awkward (e.g., __nv_cvt_halfraw2_to_fp8x2 converts float16 to float8), and some functionality is missing (e.g., one cannot convert a __half to __nv_bfloat16).

Kernel Float resolves this by offering a single data type kernel_float::vec<T, N> that stores N elements of type T. Internally, the data is stored as a fixed-sized array of elements. Operator overloading (like +, *, &&) has been implemented such that the most optimal intrinsic for the available types is selected automatically. Many mathematical functions (like log, exp, sin) and common operations (such as sum, range, for_each) are also available.

Using Kernel Float, developers avoid the complexity of reduced precision floating-point types in CUDA and can focus on their applications.

Features

In a nutshell, Kernel Float offers the following features:

Single type vec<T, N> that unifies all vector types.
Operator overloading to simplify programming.
Support for half (16 bit) floating-point arithmetic, with a fallback to single precision for unsupported operations.
Support for quarter (8 bit) floating-point types.
Easy integration as a single header file.
Written for C++17.
Compatible with NVCC (NVIDIA Compiler) and NVRTC (NVIDIA Runtime Compilation).
Compatible with HIPCC (AMD HIP Compiler)

Example

Check out the examples directory for some examples.

Below shows a simple example of a CUDA kernel that adds a constant to the input array and writes the results to the output array. Each thread processes two elements. Notice how easy it would be to change the precision (for example, double to half) or the vector size (for example, 4 instead of 2 items per thread).

```cpp

include "kernel_float.h"

namespace kf = kernel_float;

global void kernel(kf::vecptr input, int constant, kf::vecptr output) { int i = blockIdx.x * blockDim.x + threadIdx.x; output[i] += input[i] * constant; }

```

Here is how the same kernel would look for CUDA without Kernel Float.

```cpp global void kernel(const half* input, double constant, float* output) { int i = blockIdx.x * blockDim.x + threadIdx.x; _half in0 = input[2 * i + 0]; _half in1 = input[2 * i + 1]; _half2 a = _halves2half2(in0, in1); _half b = _int2halfrn(constant); _half2 c = _half2half2(b); _half2 d = _hmul2(a, c); _half e = _low2half(d); _half f = _high2half(d); float out0 = _half2float(e); float out1 = __half2float(f); output[2 * i + 0] += out0; output[2 * i + 1] += out1; }

```

Even though the second kernel looks a lot more complex, the PTX code generated by these two kernels is nearly identical.

Installation

This is a header-only library. Copy the file single_include/kernel_float.h to your project and include it:

```cpp

include "kernel_float.h"

```

Use the provided Makefile to generate this single-include header file if it is outdated:

make

Documentation

See the documentation for the API reference of all functionality.

License

Licensed under Apache 2.0. See LICENSE.

Related Work

Owner

Name: Kernel Tuner
Login: KernelTuner
Kind: organization
Location: Netherlands

Website: https://kerneltuner.github.io/
Repositories: 6
Profile: https://github.com/KernelTuner

Kernel Tuner is a software ecosystem for the creation of highly-optimized GPU applications through auto-tuning.

GitHub Events

Total

Create event: 2
Release event: 1
Issues event: 1
Watch event: 3
Push event: 39
Pull request event: 4
Fork event: 1

Last Year

Create event: 2
Release event: 1
Issues event: 1
Watch event: 3
Push event: 39
Pull request event: 4
Fork event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 4
Total pull requests: 7
Average time to close issues: 3 months
Average time to close pull requests: 3 days
Total issue authors: 2
Total pull request authors: 3
Average comments per issue: 1.5
Average comments per pull request: 0.14
Merged pull requests: 7
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 4
Pull requests: 6
Average time to close issues: 3 months
Average time to close pull requests: 4 days
Issue authors: 2
Pull request authors: 3
Average comments per issue: 1.5
Average comments per pull request: 0.17
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

kernel_float

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Kernel Float

Summary

Features

Example

include "kernel_float.h"

Installation

include "kernel_float.h"

Documentation

License

Related Work

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels