kernel_launcher

Using C++ magic to launch/capture CUDA kernels and tune them with Kernel Tuner

https://github.com/kerneltuner/kernel_launcher

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.8%) to scientific vocabulary

Keywords

cpp cuda gpu kernel-tuner

Last synced: 6 months ago · JSON representation ·

Repository

Using C++ magic to launch/capture CUDA kernels and tune them with Kernel Tuner

Basic Info

Host: GitHub
Owner: KernelTuner
License: apache-2.0
Language: C++
Default Branch: master
Homepage: https://KernelTuner.github.io/kernel_launcher/
Size: 4.82 MB

Statistics

Stars: 20
Watchers: 1
Forks: 2
Open Issues: 2
Releases: 2

Topics

cpp cuda gpu kernel-tuner

Created over 3 years ago · Last pushed almost 2 years ago

Metadata Files

Readme Changelog Contributing License Code of conduct Citation

Kernel Launcher

GitHub branch checks state GitHub GitHub tag (latest by date)

Kernel Launcher is a C++ library that enables dynamic compilation of CUDA kernels at run time (using NVRTC) and launching them in an easy type-safe way using C++ magic. On top of that, Kernel Launcher supports capturing kernel launches, to enable tuning by Kernel Tuner, and importing the tuning results, known as wisdom files, back into the application. The result: highly efficient GPU applications with maximum portability.

Installation

Recommended installation is using CMake. See the installation guide.

Example

There are many ways of using Kernel Launcher. See the documentation for examples or check out the examples/ directory.

Pragma-based API

Below shows an example of using the pragma-based API, which allows existing CUDA kernels to be annotated with Kernel-Launcher-specific directives.

kernel.cu ```cpp

pragma kernel tune(threadsperblock=32, 64, 128, 256, 512, 1024)

pragma kernel blocksize(threadsper_block)

pragma kernel problem_size(n)

pragma kernel buffers(A[n], B[n], C[n])

template global void vectoradd(int n, T *C, const T *A, const T *B) { int i = blockIdx.x * threadsper_block + threadIdx.x; if (i < n) { C[i] = A[i] + B[i]; } } ```

main.cpp ```cpp

include "kernel_launcher.h"

int main() { // Initialize CUDA memory. This is outside the scope of kernellauncher. unsigned int n = 1000000; float *devA, devB, *devC; / cudaMalloc, cudaMemcpy, ... */

// Namespace alias.
namespace kl = kernel_launcher;

// Launch the kernel! Again, the grid size and block size do not need to
// be specified, they are calculated from the kernel specifications and
// run-time arguments.
kl::launch(
    kl::PragmaKernel("vector_add", "kernel.cu", {"float"}),
    n, dev_C, dev_A, dev_B
);

}

```

Builder-based API

Below shows an example of the KernelBuilder-based API. This offers more flexiblity than the pragma-based API, but is also more verbose:

kernel.cu cpp template <typename T> __global__ void vector_add(int n, T *C, const T *A, const T *B) { int i = blockIdx.x * blockDim.x + threadIdx.x; if (i < n) { C[i] = A[i] + B[i]; } }

main.cpp ```cpp

include "kernel_launcher.h"

int main() { // Namespace alias. namespace kl = kernel_launcher;

// Define the variables that can be tuned for this kernel.
auto space = kl::ConfigSpace();
auto threads_per_block = space.tune("block_size", {32, 64, 128, 256, 512, 1024});

// Create a kernel builder and set kernel properties such as block size,
// grid divisor, template arguments, etc.
auto builder = kl::KernelBuilder("vector_add", "kernel.cu", space);
builder
    .template_args(kl::type_of<float>())
    .problem_size(kl::arg0)
    .block_size(threads_per_block);

// Define the kernel
auto vector_add_kernel = kl::WisdomKernel(builder);

// Initialize CUDA memory. This is outside the scope of kernel_launcher.
unsigned int n = 1000000;
float *dev_A, *dev_B, *dev_C;
/* cudaMalloc, cudaMemcpy, ... */

// Launch the kernel! Note that kernel is compiled on the first call.
// The grid size and block size do not need to be specified, they are
// derived from the kernel specifications and run-time arguments.
vector_add_kernel(n, dev_C, dev_A, dev_B);

} ```

License

Licensed under Apache 2.0. See LICENSE.

Citation

If you use Kernel Launcher in your work, please cite the following publication:

S. Heldens, B. van Werkhoven (2023), "Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications", The Eighteenth International Workshop on Automatic Performance Tuning (iWAPT2023) co-located with IPDPS 2023

As BibTeX:

Latex @article{heldens2023kernellauncher, title={Kernel Launcher: C++ Library for Optimal-Performance Portable CUDA Applications}, author={Heldens, Stijn and van Werkhoven, Ben}, journal={The Eighteenth International Workshop on Automatic Performance Tuning (iWAPT2023) co-located with IPDPS 2023}, year={2023} }

Related Work

Kernel Tuner

Owner

Name: Kernel Tuner
Login: KernelTuner
Kind: organization
Location: Netherlands

Website: https://kerneltuner.github.io/
Repositories: 6
Profile: https://github.com/KernelTuner

Kernel Tuner is a software ecosystem for the creation of highly-optimized GPU applications through auto-tuning.

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Kernel Launcher
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Stijn
    family-names: Heldens
    email: s.heldens@esciencecenter.nl
    affiliation: Netherlands eScience Center
  - given-names: Ben
    family-names: van Werkhoven
    email: b.vanwerkhoven@esciencecenter.nl
    affiliation: Netherlands eScience Center
identifiers:
  - type: url
    value: 'https://github.com/KernelTuner/kernel_launcher'
    description: GitHub repository
repository-code: 'https://github.com/KernelTuner/kernel_launcher'
url: 'https://kerneltuner.github.io/kernel_launcher/'
abstract: >-
  Kernel Launcher is a C++ library that makes it easy to
  dynamically compile CUDA kernels at run time (using NVRTC)
  and call them in an easy type-safe way using C++ magic.
  Additionally, Kernel Launcher supports exporting kernel
  specifications, to enable tuning by Kernel Tuner, and
  importing the tuning results, known as wisdom files, back
  into the application.
keywords:
  - CUDA
  - auto-tuning
  - C++
  - library
license: Apache-2.0

GitHub Events

Total

Watch event: 4

Last Year

Watch event: 4

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 3
Total pull requests: 3
Average time to close issues: 30 days
Average time to close pull requests: 1 day
Total issue authors: 1
Total pull request authors: 3
Average comments per issue: 0.33
Average comments per pull request: 0.33
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

stijnh (3)

Pull Request Authors

bartvstratum (1)
stijnh (1)
benvanwerkhoven (1)

Top Labels

Issue Labels

enhancement (2) good first issue (1) help wanted (1) bug (1)

Pull Request Labels

Dependencies

.github/workflows/docs.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite
ad-m/github-push-action master composite
sphinx-notes/pages v2 composite

kernel_launcher

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Kernel Launcher

Installation

Example

Pragma-based API

pragma kernel tune(threadsperblock=32, 64, 128, 256, 512, 1024)

pragma kernel blocksize(threadsper_block)

pragma kernel problem_size(n)

pragma kernel buffers(A[n], B[n], C[n])

include "kernel_launcher.h"

Builder-based API

include "kernel_launcher.h"

License

Citation

Related Work

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies