Recent Releases of kernel-tuner

kernel-tuner - Version 1.3.0

This release presents another major step forwards in particular with regard to hyperparameter tuning of the optimization strategies in Kernel Tuner. In addition, many of the optimization strategies have been made aware of constraints. This means they will initialize with only valid configurations, use the search space object to query only valid neighbors, and when needed repair invalid configs to valid neighboring ones.

In addition, the Differential Evolution strategy previously relied on scipy.optimize.diff_evo, which has now been replaced with a brand new implementation that is more suited for discrete search spaces, including those with strings as parameter valus, and the strategy is also constraint-aware.

Finally, Kernel Tuner now also allows users to pass their own optimization algorithms as search strategies for auto-tuning. For this purpose, kernel_tuner.strategies.wrapper implements an OptAlgWrapper class that can wrap an existing optimizer.

What's Changed

  • Hyperparametertuning custom strategies by @nikivanstein in https://github.com/KernelTuner/kernel_tuner/pull/325
  • Hyperparameter tuning for custom strategies by @fjwillemsen in https://github.com/KernelTuner/kernel_tuner/pull/329
  • add support for user-defined optimization algorithms by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/287
  • Hyperparameter tuning by @fjwillemsen in https://github.com/KernelTuner/kernel_tuner/pull/289
  • Constrained optimization by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/298
  • Tunable constrained optimization algorithms by @fjwillemsen in https://github.com/KernelTuner/kernel_tuner/pull/324
  • Replace differential evolution strategy by @benvanwerkhoven https://github.com/KernelTuner/kernel_tuner/pull/322

New Contributors

  • @nikivanstein made their first contribution in https://github.com/KernelTuner/kernel_tuner/pull/325

Full Changelog: https://github.com/KernelTuner/kernel_tuner/compare/1.2...1.3.0

- Python
Published by benvanwerkhoven 9 months ago

kernel-tuner - Version 1.2

This release includes many different fixes and upgrades in different areas. In particular, the search space construction, and OpenMP support. Bugs were fixed related to optimizing using maximization instead of minimization impacting all strategies and in particular for Firefly. Smaller improvements have been made to improve user-friendliness, documentation, Python 3.13 compatibility, the HIP backend, support for string-valued tunable parameters for mixed-precision tuning.

What's Changed

  • OpenMP by @isazi in https://github.com/KernelTuner/kernel_tuner/pull/273
  • Bump tornado from 6.4.2 to 6.5.1 in /doc by @dependabot[bot] in https://github.com/KernelTuner/kernel_tuner/pull/309
  • Resolve regex calls warnings by @emmanuel-ferdman in https://github.com/KernelTuner/kernel_tuner/pull/308
  • More user-friendly error messages for HIP backend by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/303
  • Add support for 16-bit floats in HIP backend by @stijnh in https://github.com/KernelTuner/kernel_tuner/pull/301
  • Display the invalid identifier name on error by @emmanuel-ferdman in https://github.com/KernelTuner/kernel_tuner/pull/310
  • Bump requests from 2.32.3 to 2.32.4 in /doc by @dependabot[bot] in https://github.com/KernelTuner/kernel_tuner/pull/311
  • Change Firefly algorithm to use negation instead of division by @stijnh in https://github.com/KernelTuner/kernel_tuner/pull/317
  • Bump urllib3 from 2.3.0 to 2.5.0 in /doc by @dependabot[bot] in https://github.com/KernelTuner/kernel_tuner/pull/316
  • Change CostFunc to return +inf when objective_higher_is_better by @stijnh in https://github.com/KernelTuner/kernel_tuner/pull/315
  • Extended searchspace construction and input format support by @fjwillemsen in https://github.com/KernelTuner/kernel_tuner/pull/278
  • Fix documentation by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/319
  • fix issue #318 by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/320
  • Fix error 'grid divisor cannot be integer' (issue #264) by @stijnh in https://github.com/KernelTuner/kernel_tuner/pull/306
  • re-add support for user-specified starting point by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/297
  • add default optimization direction for 'fitness' and 'cost' by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/323
  • Improve warning on kernel source not found by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/321
  • use searchspace to check config validity in costfunc by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/327

Full Changelog: https://github.com/KernelTuner/kernel_tuner/compare/1.1.3...1.2

- Python
Published by benvanwerkhoven 11 months ago

kernel-tuner - Version 1.1.3

This release contains a number of small bugfixes and enables support on Nvidia Blackwell GPUs.

What's Changed

  • Resolve deprecation warnings of regex library by @emmanuel-ferdman in https://github.com/KernelTuner/kernel_tuner/pull/296
  • Support three-digit compute capability by @csbnw in https://github.com/KernelTuner/kernel_tuner/pull/299
  • Add support for half and bfloat16 scalars in pyCUDA backend by @stijnh in https://github.com/KernelTuner/kernel_tuner/pull/300
  • Fix issue #245 by @stijnh in https://github.com/KernelTuner/kernel_tuner/pull/302

New Contributors

  • @emmanuel-ferdman made their first contribution in https://github.com/KernelTuner/kernel_tuner/pull/296

Full Changelog: https://github.com/KernelTuner/kernel_tuner/compare/1.1.2...1.1.3

- Python
Published by benvanwerkhoven about 1 year ago

kernel-tuner - Version 1.1.2

This release would not have been necessary if I had not forgotten to increment the version number on the previous release that I made 20 minutes ago. Alas, we all make mistakes sometimes.

- Python
Published by benvanwerkhoven about 1 year ago

kernel-tuner - Version 1.1.1

The sole purpose of this release is to support Numpy 2.0 and newer. The main motivation for this is to make the examples and tutorial notebooks working again on Google Colab.

What's Changed

  • Numpy2 support by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/295

Full Changelog: https://github.com/KernelTuner/kernel_tuner/compare/1.1.0...1.1.1

- Python
Published by benvanwerkhoven about 1 year ago

kernel-tuner - Version 1.1.0

This release integrates many smaller changes that have been made over the past year.

The most significant new features are: * The NCUObserver to include performance metrics from the Nvidia Profiler during tuning * TegraObserver to read/set clock frequencies, power and temperature on Nvidia Jetson GPUs

In addition, a lot of work has been put into several backends, including OpenACC, the compiler backend, the HIP backend and so on.

Thanks to everyone who contributed to Kernel Tuner in the past year!

What's Changed

  • Add Tegra Observer to control clocks on Jetson devices by @loostrum in https://github.com/KernelTuner/kernel_tuner/pull/243
  • Catch RuntimeError when importing from pyhip by @loostrum in https://github.com/KernelTuner/kernel_tuner/pull/252
  • Bump pillow from 10.2.0 to 10.3.0 by @dependabot in https://github.com/KernelTuner/kernel_tuner/pull/249
  • Read instant power in pwrusage by @csbnw in https://github.com/KernelTuner/kerneltuner/pull/247
  • Bump idna from 3.6 to 3.7 by @dependabot in https://github.com/KernelTuner/kernel_tuner/pull/250
  • Register observer & correct clock setting by @fjwillemsen in https://github.com/KernelTuner/kernel_tuner/pull/242
  • Compiler backend uses g++ instead of gcc by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/254
  • Improved OpenACC support by @isazi in https://github.com/KernelTuner/kernel_tuner/pull/248
  • Small improvements to searchspaces and simulation mode by @fjwillemsen in https://github.com/KernelTuner/kernel_tuner/pull/251
  • Simplify contributing info by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/255
  • Support Python 3.12 and drop Python 3.8 by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/256
  • Support Python 3.12 and drop Python 3.8 (2) by @fjwillemsen in https://github.com/KernelTuner/kernel_tuner/pull/260
  • Add NCUObserver by @csbnw in https://github.com/KernelTuner/kernel_tuner/pull/253
  • Update PMTObserver for latest PMT changes by @csbnw in https://github.com/KernelTuner/kernel_tuner/pull/261
  • OpenACC bug fixing by @isazi in https://github.com/KernelTuner/kernel_tuner/pull/262
  • ESiWACE3 hackathon by @isazi in https://github.com/KernelTuner/kernel_tuner/pull/267
  • fix reading of graphics and memory clocks by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/271
  • Directives: summer refactoring by @isazi in https://github.com/KernelTuner/kernel_tuner/pull/269
  • Tegra observer by @MartijnFr in https://github.com/KernelTuner/kernel_tuner/pull/270
  • Tegra observer with continuous observer by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/275
  • base implementation for pmt continuous observer by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/276
  • Add support for float16 to HIP backend by @loostrum in https://github.com/KernelTuner/kernel_tuner/pull/280
  • Fix: out-of-date PMTContinuousObserver readings by @wvbbreu in https://github.com/KernelTuner/kernel_tuner/pull/283
  • Hip local memory error handling by @MiloLurati in https://github.com/KernelTuner/kernel_tuner/pull/284
  • Replacing PyHIP with new official python wrapper of ROCm HIP by @MiloLurati in https://github.com/KernelTuner/kernel_tuner/pull/285
  • update observer to latest python bindings by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/279
  • add support for any case spelling of block size name defaults by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/277
  • update documentation by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/293
  • Updated pyproject to use hip-python from testpypi by @fjwillemsen in https://github.com/KernelTuner/kernel_tuner/pull/294

New Contributors

  • @MartijnFr made their first contribution in https://github.com/KernelTuner/kernel_tuner/pull/270
  • @wvbbreu made their first contribution in https://github.com/KernelTuner/kernel_tuner/pull/283

Full Changelog: https://github.com/KernelTuner/kernel_tuner/compare/1.0...1.1.0

- Python
Published by benvanwerkhoven about 1 year ago

kernel-tuner - Version 1.0

Finally, the Version 1.0 release is here! The software has been stable and ready for production use for quite some time now and after being in beta for about a half a year, we are confident that the current version of the software deserves to mark the first major release of Kernel Tuner.

Version 1.0 integrates a lot of new functionality, including blazing fast search space construction, support for tuning HIP kernels on AMD GPUs, new functionality for mixed precision and accuracy tuning, experimental support for tuning OpenACC programs, a conda package installer for Kernel Tuner, and many more changes and additions.

I would like to thank every one involved in the development of Kernel Tuner of the past years! Special thanks to the Kernel Tuner developers team for their continued support of the project!

From the Changelog

  • HIP backend to support tuning HIP kernels on AMD GPUs
  • Experimental features for mixed-precision and accuracy tuning
  • Experimental features for OpenACC tuning
  • Major speedup due to new parser and using revamped python-constraint for searchspace building
  • Implemented ability to use PySMT and ATF for searchspace building
  • Added Poetry for dependency and build management
  • Switched from setup.py and setup.cfg to pyproject.toml for centralized metadata, added relevant tests
  • Updated GitHub Action workflows to use Poetry
  • Updated dependencies, most notably NumPy is no longer version-locked as scikit-opt is no longer a dependency
  • Documentation now uses pyproject.toml metadata, minor fixes and changes to be compatible with updated dependencies
  • Set up Nox for testing on all supported Python versions in isolated environments
  • Added linting information, VS Code settings and recommendations
  • Discontinued use of OrderedDict, as all dictionaries in the Python versions used are already ordered
  • Dropped Python 3.7 support

Merged Pull Requests

  • HIP Backend by @MiloLurati in https://github.com/KernelTuner/kernel_tuner/pull/199
  • Accuracy tuning by @stijnh in https://github.com/KernelTuner/kernel_tuner/pull/189
  • Fix issue where HIP backend fails due to invalid arguments type by @stijnh in https://github.com/KernelTuner/kernel_tuner/pull/216
  • Searchspace improvements and project meta modernization by @fjwillemsen in https://github.com/KernelTuner/kernel_tuner/pull/214
  • Minor bugfix by @isazi in https://github.com/KernelTuner/kernel_tuner/pull/219
  • OpenACC support by @isazi in https://github.com/KernelTuner/kernel_tuner/pull/197
  • Fixed broken tests as per issue #217 by @fjwillemsen in https://github.com/KernelTuner/kernel_tuner/pull/220
  • Fix snaptonearest on non-numeric parameters by @stijnh in https://github.com/KernelTuner/kernel_tuner/pull/221
  • expand documentation on backends by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/213
  • Add support for passing cupy arrays to "C" lang by @bouweandela in https://github.com/KernelTuner/kernel_tuner/pull/226
  • improve code quality of cache file related functions by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/240
  • New readme by @benvanwerkhoven in https://github.com/KernelTuner/kernel_tuner/pull/231

New Contributors

  • @MiloLurati made their first contribution in https://github.com/KernelTuner/kernel_tuner/pull/199
  • @dependabot made their first contribution in https://github.com/KernelTuner/kernel_tuner/pull/222
  • @bouweandela made their first contribution in https://github.com/KernelTuner/kernel_tuner/pull/226

Full Changelog: https://github.com/KernelTuner/kernel_tuner/compare/0.4.5...1.0

- Python
Published by benvanwerkhoven about 2 years ago

kernel-tuner - Version 1.0.0b6

This is a beta release for early access to the new features. Not intended for production use.

The release contains:

  • Inclusion of tests in the source package, as requested in #225
  • Updated dependencies

- Python
Published by fjwillemsen over 2 years ago

kernel-tuner - Version 1.0.0b5

This is a beta release for early access to the new features. Not intended for production use.

The release contains: - Expanded documentation on backends by @benvanwerkhoven in https://github.com/KernelTuner/kerneltuner/pull/213 - A fix for an issue that could cause incorrect conversion to Constraint - Extended tests to detect this - Bump urllib3 from 2.0.6 to 2.0.7 by @dependabot in https://github.com/KernelTuner/kerneltuner/pull/222 - Updated dependencies

Full Changelog: https://github.com/KernelTuner/kernel_tuner/compare/1.0.0b4...1.0.0b5

- Python
Published by fjwillemsen over 2 years ago

kernel-tuner - Version 1.0.0b4

This is a beta release for early access to the new features. Not intended for production use.

This release contains several improvements:

  • nvidia-ml-py added to tutorial extra dependencies.
  • Additional checks for coherent Poetry configuration and warning in case of outdated development environment.
  • Updated dependencies.

- Python
Published by fjwillemsen over 2 years ago

kernel-tuner - Version 1.0.0b3

This is a beta release for early access to the new features. Not intended for production use.

This version contains several bugfixes: * Fix snaptonearest on non-numeric parameters by @stijnh in https://github.com/KernelTuner/kerneltuner/pull/221 * Fixed an issue where some restrictions would not be recognized by the old `checkrestrictionsfunction. * Fixed an issue wherebayes_opt` would not handle pruned parameters correctly.

Full Changelog: https://github.com/KernelTuner/kernel_tuner/compare/1.0.0b2...1.0.0b3

- Python
Published by fjwillemsen over 2 years ago

kernel-tuner - Version 1.0.0b2

This is a beta release for early access to the new features. Not intended for production use.

Full Changelog: https://github.com/KernelTuner/kernel_tuner/compare/1.0.0b1...1.0.0b2

- Python
Published by fjwillemsen over 2 years ago

kernel-tuner - Version 1.0.0 beta 1

This is a beta release for early access to the new features. Not intended for production use.

What's Changed

  • HIP Backend by @MiloLurati in https://github.com/KernelTuner/kernel_tuner/pull/199
  • Accuracy tuning by @stijnh in https://github.com/KernelTuner/kernel_tuner/pull/189
  • Fix issue where HIP backend fails due to invalid arguments type by @stijnh in https://github.com/KernelTuner/kernel_tuner/pull/216
  • Searchspace improvements and project meta modernization by @fjwillemsen in https://github.com/KernelTuner/kernel_tuner/pull/214
  • Minor bugfix by @isazi in https://github.com/KernelTuner/kernel_tuner/pull/219
  • OpenACC support by @isazi in https://github.com/KernelTuner/kernel_tuner/pull/197
  • Fixed broken tests as per issue #217 by @fjwillemsen in https://github.com/KernelTuner/kernel_tuner/pull/220

New Contributors

  • @MiloLurati made their first contribution in https://github.com/KernelTuner/kernel_tuner/pull/199

Full Changelog: https://github.com/KernelTuner/kernel_tuner/compare/0.4.5...1.0.0b1

- Python
Published by fjwillemsen over 2 years ago

kernel-tuner - Version 0.4.5

Version 0.4.5 adds support of using PMT in combination with Kernel Tuner enabling power and energy measurements on a wide range of devices. In addition, we have worked extensively on the internals of Kernel Tuner and the interfaces of the separate components that together make up Kernel Tuner. Along with a few bugfixes, fixes of small errors in examples and documentation.

[0.4.5] - 2023-06-01

Added

  • PMTObserver to measure power and energy on various platforms

Changed

  • Improved functionality for storing output and metadata files
  • Updated PowerSensorObserver to support PowerSensor3
  • Refactored interal interfaces of runners and backends
  • Bugfix in interface to set objective and optimization direction

- Python
Published by benvanwerkhoven about 3 years ago

kernel-tuner - Version 0.4.4

Version 0.4.4

Version 0.4.4 adds extended support for energy efficiency tuning. In particular, with the new capability to fit a performance model to the target GPUs power-frequency curve. How to use these features is demonstrated in: https://github.com/KernelTuner/kerneltuner/blob/master/examples/cuda/goinggreenperformancemodel.py

And described in the paper:

Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning R. Schoonhoven, B. Veenboer, B. van Werkhoven, K. J. Batenburg International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS) at Supercomputing (SC22) 2022 https://arxiv.org/abs/2211.07260

Other than that, we've implemented a new output and metadata JSON format that adheres to the 'T4' auto-tuning schema created by the auto-tuning community at the Lorentz Center workshop in March 2022.

From the changelog:

[0.4.4] - 2023-03-09

Added

  • Support for using time_limit in simulation mode
  • Helper functions for energy tuning
  • Example to show ridge frequency and power-frequency model
  • Functions to store tuning output and metadata

Changed

  • Changed what timings are stored in cache files
  • No longer inserting partial loop unrolling factor of 0 in CUDA

- Python
Published by benvanwerkhoven about 3 years ago

kernel-tuner - Version 0.4.3

The version 0.4.3 release consists of a large number of changes to the internals of Kernel Tuner, including the addition of a new backend based on Nvidia's official Python bindings for CUDA, as well as improved functionality for tuning energy efficiency, e.g. measuring core voltages, the measurement of power and the interface with NVML has also improved a lot.

Some of the changes are also in the "externals" of Kernel Tuner. In the sense that we have migrated from https://github.com/benvanwerkhoven/ to https://github.com/KernelTuner. The goal of this move is to bring the collection of repositories belonging to the larger Kernel Tuner project under one organization.

From the Changelog:

[0.4.3] - 2022-10-19

Added

  • A new backend that uses Nvidia cuda-python
  • Support for locked clocks in NVMLObserver
  • Support for measuring core voltages using NVML
  • Support for custom preprocessor definitions
  • Support for boolean scalar arguments in PyCUDA backend

Changed

  • Migrated from github.com/benvanwerkhoven to github.com/KernelTuner
  • Significant update to the documentation pages
  • Unified benchmarking loops across backends
  • Backends are no longer context managers
  • Replaced the method for measuring power consumption using NVML
  • Improved NVML measurements of temperature and clock frequencies
  • bugfix in parse_restrictions when using and/or in expressions
  • bugfix in GreedyILS when using neighbor method "adjacent"
  • bugfix in Bayesian Optimization for small problems

- Python
Published by benvanwerkhoven over 3 years ago

kernel-tuner - Version 0.4.2

Version 0.4.2 includes a lot of work on the search space representation, application of restrictions, and optimization strategies. In addition to the addition of several new optimization strategies, most optimization strategies should see improved performance both in terms of the number of evaluated kernel configurations as well as execution time.

Added

  • new optimization strategies: dual annealing, greedly ILS, ordered greedy MLS, greedy MLS
  • support for constant memory in cupy backend
  • constraint solver to cut down time spent in creating search spaces
  • support for custom tuning objectives
  • support for maxfevals and timelimit in strategy_options of all strategies

Removed

  • alternative Bayesian Optimization strategies that could not be used directly
  • C++ wrapper module that was too specific and hardly used

Changed

  • string-based restrictions are compiled into functions for improved performance
  • genetic algorithm, MLS, ILS, random, and simulated annealing use new search space object
  • diff evo, firefly, PSO are initialized using population of all valid configurations
  • all strategies except bruteforce strictly adhere to maxfevals and time_limit
  • simulated annealing adapts annealing schedule to max_fevals if supplied
  • minimize, basinhopping, and dual annealing start from a random valid config

- Python
Published by benvanwerkhoven about 4 years ago

kernel-tuner - Version 0.4.1

This version adds a brand new Bayesian Optimization strategy, as well as some smaller features and fixes.

[0.4.1] - 2021-09-10

Added

  • support for PyTorch Tensors as input data type for kernels
  • support for smemargs in runkernel
  • support for (lambda) function and string for dynamic shared memory size
  • a new Bayesian Optimization strategy

Changed

  • optionally store the kernelstring with storeresults
  • improved reporting of skipped configurations

- Python
Published by benvanwerkhoven over 4 years ago

kernel-tuner - Version 0.4.0

This version adds a great deal of new functionality and extra flexibility and additional control to the user over what is being benchmarked and when. From the CHANGELOG:

Added

  • support for (lambda) function instead of list of strings for restrictions
  • support for (lambda) function instead of list for specifying grid divisors
  • support for (lambda) function instead of tuple for specifying problem_size
  • function to store the top tuning results
  • function to create header file with device targets from stored results
  • support for using tuning results in PythonKernel
  • option to control measurements using observers
  • support for NVML tunable parameters
  • option to simulate auto-tuning searches from existing cache files
  • Cupy backend to support C++ templated CUDA kernels
  • support for templated CUDA kernels using PyCUDA backend
  • documentation on tunable parameter vocabulary

- Python
Published by benvanwerkhoven about 5 years ago

kernel-tuner - Version 0.3.2

Version 0.3.2

This version adds several new and recent features. Most importantly is the new feature to specify user-defined metrics for Kernel Tuner to compute along with the benchmarking results. User-defined metrics are composable, so you can define metrics that build upon other metrics. The documentation pages have also been updated to include this new feature and other recent changes.

An important change that might influence benchmark results reported by Kernel Tuner is the fact that the runner will now do a warm up of the device using the first kernel in the parameter space. This is to remove any startup or cold start delays that were significantly slowing down the first benchmarked kernel on many devices.

From the changelog:

[0.3.2] - 2020-11-04

Added

  • support loop unrolling using params that start with loopunrollfactor
  • always insert "define kerneltuner 1" to allow preprocessor ifdef kerneltuner
  • support for user-defined metrics
  • support for choosing the optimization starting point x0 for most strategies

Changed

  • more compact output is printed to the terminal
  • sequential runner runs first kernel in the parameter space to warm up device
  • updated tutorials to demonstrate use of user-defined metrics

- Python
Published by benvanwerkhoven over 5 years ago

kernel-tuner - Version 0.3.1

A small release for 2 small new features and a bugfix for older GPUs.

[0.3.1] - 2020-06-11

Added

  • kernelbuilder functionality for including kernels in Python applications
  • smem_args option for dynamically allocated shared memory in CUDA kernels

Changed

  • bugfix for NVML Error on Nvidia devices without internal current sensor

- Python
Published by benvanwerkhoven almost 6 years ago

kernel-tuner - Version 0.3.0

Version 0.3.0

This is the release of version 0.3.0 of Kernel Tuner. We have done a lot of work on the internals of Kernel Tuner. This release fixes several issues, adds and extends new features, and simplifies the user interface.

[0.3.0] - 2019-12-20

Changed

  • fix for output checking, custom verify functions are called just once
  • benchmarking now returns multiple results not only time
  • more sophisticated implementation of genetic algorithm strategy
  • how the "method" option is passed, now use strategy_options

Added

  • Bayesian Optimizaton strategy, use strategy="bayes_opt"
  • support for kernels that use texture memory in CUDA
  • support for measuring energy consumption of CUDA kernels
  • option to set strategy_options to pass strategy specific options
  • option to cache and restart from tuned kernel configurations cachefile

Removed

  • Python 2 support, it may still work but we no longer test for Python 2
  • Noodles parallel runner

- Python
Published by benvanwerkhoven over 6 years ago

kernel-tuner - Version 0.2.0

Version 0.2.0

Version 0.2.0 adds a large number of search optimization algorithms and basic support for testing and tuning Fortran kernels.

Changed

  • no longer replacing kernel names with instance strings during tuning
  • bugfix in tempfile creation that lead to too many open files error

Added

  • A minimal Fortran example and basic Fortran support
  • Particle Swarm Optimization strategy, use strategy="pso"
  • Simulated Annealing strategy, use strategy="simulated_annealing"
  • Firefly Algorithm strategy, use strategy="firefly_algorithm"
  • Genetic Algorithm strategy, use strategy="genetic_algorithm"

- Python
Published by benvanwerkhoven over 7 years ago

kernel-tuner - Version 0.1.9

[0.1.9] - 2018-04-18

Changed

  • bugfix for C backend for byte array arguments
  • argument type mismatches throw warning instead of exception

Added

  • wrapper functionality to wrap C++ functions
  • citation file and zenodo doi generation for releases

- Python
Published by benvanwerkhoven about 8 years ago

kernel-tuner - Version 0.1.8

Version 0.1.8 brings many improvements, mostly focused on user friendliness. The installation process of optional dependencies is simplified as you can now use extras with pip. For example, pip install kernel_tuner[cuda] can be used to install both Kernel Tuner and the optional dependency PyCuda. In addition, Version 0.1.8 introduces many more checks on the user input that you pass to tunekernel and runkernel. For example, the kernel source code is parsed to see if the signature matches the argument list. The additional checks on input should make it easier to use and debug programs using Kernel Tuner. For a more detailed overview of the changes, see below:

[0.1.8] - 2017-11-23

Changed

  • bugfix for when using iterations smaller than 3
  • the install procedure now uses extras, e.g. [cuda,opencl]
  • option quiet makes tune_kernel completely quiet
  • extensive updates to documentation

Added

  • type checking for kernel arguments and answers lists
  • checks for reserved keywords in tunable paramters
  • checks for whether thread block dimensions are specified
  • printing units for measured time with CUDA and OpenCL
  • option to print all measured execution times

- Python
Published by benvanwerkhoven over 8 years ago

kernel-tuner - Version 0.1.7

[0.1.7] - 2017-10-11

Changed

  • bugfix install when scipy not present
  • bugfix for GPU cleanup when using Noodles runner
  • reworked the way strings are handled internally

Added

  • option to set compiler name, when using C backend

- Python
Published by benvanwerkhoven over 8 years ago

kernel-tuner - Version 0.1.6

Version 0.1.6

Version 0.1.6 brings a few bugfixes but mostly extends the existing functionality of the tuner. Three new search strategies have been added and are now ready to use: minimize, basinhopping, and diff_evo. For more info on what these strategies do and what solvers and methods they support please see the documentation pages.

From the CHANGELOG:

[0.1.6] - 2017-08-17

Changed

  • actively freeing GPU memory after tuning
  • bugfix for 3D grids when using OpenCL

Added

  • support for dynamic parallelism when using PyCUDA
  • option to use differential evolution optimization
  • global optimization strategies basinhopping, minimize

- Python
Published by benvanwerkhoven almost 9 years ago

kernel-tuner - Version 0.1.5

Version 0.1.5

Version 0.1.5 brings more flexibility, you can now pass code generating functions, your own functions for verifying kernel output correctness, and use your own names for the thread block dimensions.

Internally, quite a lot has changed in this version. The runners have been separated into strategies and runners. And the way that options are passed around within the Kernel Tuner has changed dramatically.

From the CHANGELOG:

[0.1.5] - 2017-07-21

Changed

  • option to pass a fraction to the sample runner
  • fixed a bug in memset for OpenCL backend

Added

  • parallel tuning on single node using Noodles runner
  • option to pass new defaults for block dimensions
  • option to pass a Python function as code generator
  • option to pass custom function for output verification

- Python
Published by benvanwerkhoven almost 9 years ago

kernel-tuner - Version 0.1.4

This release adds that tune_kernel will also return a dictionary containing information about the environment in which the benchmarking of the kernel was performed. This is very useful for understanding how and under what circumstances certain measurement results were obtained.

In addition, there were some very minor changes in the way C functions are compiled and called.

- Python
Published by benvanwerkhoven almost 9 years ago

kernel-tuner - Version 0.1.3

Bugfixes for handling scalar arguments and documentation update.

- Python
Published by benvanwerkhoven about 9 years ago

kernel-tuner - Version 0.1.2

Better defaults for grid divisor lists, full support for 3D grids, and a simpler way to specify the problem size of 1D grids.

[0.1.2] - 2017-03-29

Changed

  • allow non-tuple problem_size for 1D grids
  • changed default for griddivy from None to blocksizey
  • converted the tutorial to a Jupyter Notebook
  • CUDA backend prints device in use, similar to OpenCL backend
  • migrating from nosetests to pytest
  • rewrote many of the examples to save results to json files

Added

  • full support for 3D grids, including option for griddivz
  • separable convolution example

- Python
Published by benvanwerkhoven about 9 years ago

kernel-tuner - Version 0.1.1

[0.1.1] - 2017-02-10

Changed

  • changed the output format to list of dictionaries

Added

  • option to set compiler options

- Python
Published by benvanwerkhoven over 9 years ago

kernel-tuner - version 0.1.0

Version 0.1.0

The Kernel Tuner should by now be ready for production use. Over the last few months we have used it in several projects, which has revealed some of the things that were fixed in this version. This release also marks the end of a period in which the internal structure of the Kernel Tuner has changed several times. We expect the current code structure to stay around for a while. With this version we also release the public roadmap for the project, to show which changes and additional features we have planned for the near and not so near future. We also feel that the software is now ready to be added to public software repositories, which we will do shortly.

- Python
Published by benvanwerkhoven over 9 years ago

kernel-tuner - first beta release

This is the first beta release of the Kernel Tuner.

This release basically marks the first version of the kernel tuner, which is currently in beta testing to see what functionality is missing and what needs to be fixed before the code can be considered production ready.

A brief description of the Kernel Tuner's functionality in this version: - Basic kernel tuning functionality for CUDA, OpenCL, and C functions - Many examples and rather extensive documentation - Search space restriction, using the 'restrictions' option - Kernel output verification, using the 'answer' option - Example showing how to tune both host code (number of streams) and GPU code - Run a single kernel with a specific parameter set and get the output

- Python
Published by benvanwerkhoven almost 10 years ago