Recent Releases of gridtools
gridtools - GridTools version 2.3.9
changes since 2.3.7
Dependencies
Removed the dependency on boost by packaging relevant headers (#1816, #1817, #1821)
Tests
Update gtest to v1.16.0 (#1823)
- C++
Published by havogt about 1 year ago
gridtools - GridTools version 2.3.8
changes since 2.3.7
Bug fixes
- addition to fix copy assignment of tuple containing refs for 1-tuples #1812
- detection of CUDA/HIP mode with CrayClang compiler #1813
- C++
Published by havogt over 1 year ago
gridtools - GridTools version 2.3.7
changes since 2.3.6
Bug fixes
- fix copy assignment of tuple containing refs #1811
- C++
Published by havogt over 1 year ago
gridtools - GridTools version 2.3.6
changes since v2.3.5
Minor feature
- fn: introduce fn::index as alias to positional #1806
Performance improvements
- Loop Blocking for fn GPU Backend #1787
- Use ‘const’ in Neighbor Table Value Types #1796
- Remove ldgptr and Replace Functionality by constptr_deref, possible improvement for sid::composite #1810
- Add Const to SID Neighbor Table Element Type #1808
Bug fixes
- Respect ReadOnly Property in Nanobind Adapter #1809
- C++
Published by havogt over 1 year ago
gridtools - GridTools version 2.3.5
changes since v2.3.4
GridTools v2.3.5 requires CMake 3.21.0 or later to properly support HIP.
Performance improvements
- Introduce ldgptr to Enable _ldg in Data Stores and simpleptrholder #1802
- Use ‘const’ in Neighbor Table Value Types #1796
Bug fixes
- Fixes for HIP detection for recent ROCm and CMake #1804
- Fix GTASSUME for NVCC and Enable GTASSUME on Recent GCC Versions #1789
- Fix include #1793
- Improve tests #1794, #1799, #1798
- C++
Published by havogt over 1 year ago
gridtools - GridTools version 2.3.4
changes since v2.3.2 (v2.3.3 removed because it introduced a breaking change in the nanobind adapter)
Performance improvements
- Introduce GTPROMISE for _builtin_assume by @havogt, @iomaganaris in https://github.com/GridTools/gridtools/pull/1785, https://github.com/GridTools/gridtools/pull/1788
Bug fixes
- Bug: storage/gpu.h functions within CUDA_ARCH by @havogt in https://github.com/GridTools/gridtools/pull/1778
- Update nanobind to v2 by @havogt in https://github.com/GridTools/gridtools/pull/1777, https://github.com/GridTools/gridtools/pull/1790
- Update minimum required boost to 1.73 by @havogt in https://github.com/GridTools/gridtools/pull/1772
Tests
- Add Missing fnunstructurednablafusedtupleoffields to Regression & Performance Tests by @fthaler in https://github.com/GridTools/gridtools/pull/1783
- Improved Data Layout for Neighbor Tables by @fthaler in https://github.com/GridTools/gridtools/pull/1782
CI / Deployment
- test NVHPC 23.9 by @havogt in https://github.com/GridTools/gridtools/pull/1769
- build: Update deployment action with trusted publisher by @havogt in https://github.com/GridTools/gridtools/pull/1770
- build: Add download wheel step in deployment action by @havogt in https://github.com/GridTools/gridtools/pull/1771
- CI: test CUDA 12.4 by @havogt in https://github.com/GridTools/gridtools/pull/1773
- GitHub actions: Update compiler versions in configure test by @havogt in https://github.com/GridTools/gridtools/pull/1760
- Move to CUDA 11.2 for daint nvcc gcc by @havogt in https://github.com/GridTools/gridtools/pull/1786
- C++
Published by havogt almost 2 years ago
gridtools - GridTools version 2.3.3
Performance improvements
- Introduce GTPROMISE for _builtin_assume by @havogt in https://github.com/GridTools/gridtools/pull/1785
Bug fixes
- Bug: storage/gpu.h functions within CUDA_ARCH by @havogt in https://github.com/GridTools/gridtools/pull/1778
- Update nanobind to v2 by @havogt in https://github.com/GridTools/gridtools/pull/1777
- Update minimum required boost to 1.73 by @havogt in https://github.com/GridTools/gridtools/pull/1772
Tests
- Add Missing fnunstructurednablafusedtupleoffields to Regression & Performance Tests by @fthaler in https://github.com/GridTools/gridtools/pull/1783
- Improved Data Layout for Neighbor Tables by @fthaler in https://github.com/GridTools/gridtools/pull/1782
CI / Deployment
- test NVHPC 23.9 by @havogt in https://github.com/GridTools/gridtools/pull/1769
- build: Update deployment action with trusted publisher by @havogt in https://github.com/GridTools/gridtools/pull/1770
- build: Add download wheel step in deployment action by @havogt in https://github.com/GridTools/gridtools/pull/1771
- CI: test CUDA 12.4 by @havogt in https://github.com/GridTools/gridtools/pull/1773
- GitHub actions: Update compiler versions in configure test by @havogt in https://github.com/GridTools/gridtools/pull/1760
- Move to CUDA 11.2 for daint nvcc gcc by @havogt in https://github.com/GridTools/gridtools/pull/1786
- C++
Published by havogt almost 2 years ago
gridtools - GridTools version 2.3.2
Bug fixes
- Apply workaround to CUDA 12.3, see #1766 (#1768)
- C++
Published by havogt over 2 years ago
gridtools - GridTools version 2.3.1
Python bindings
- Python SID adapter: Add support for HIP/ROCm buffers (#1759)
- Python bindings: Nanobind SID adapter (#1762)
Bug fixes
- Partial workarounds for CUDA 12.1 and 12.2, see #1766 (#1764)
- Fix GCC 13: add missing include (#1761)
- C++
Published by havogt over 2 years ago
gridtools - GridTools version 2.3.0
Support for NVHPC (#1747)
GridTools now supports NVHPC starting from release 23.3!
Parallel fn::backend::naive (#1746)
Naive (just parallel for, no blocking and other optimizations) OpenMP parallelization of the naive backend.
SID util to transform a dimension to a tuple_like element type (#1750)
Translates a SID with dimension D and element type T to a SID with D removed and type is tuple<T>-like, with tuplesize N for `sid::dimensiontotuplelike
Bug fixes and smaller features
- fn: allow execution of stencils with 0d domain (#1728)
- Make pybind11::buffer sid copyable (#1755)
Build fixes
- Support for Clang 16 (#1751)
and other changes already included in v2.2.3
fn: SID neighbor table wrapper (#1730)
Adds a simple class that wraps a SID and implements the neighbour table concept. (Picked for convenience into 2.2.2.)
Support for Python packaging (#1720)
Starting with this release we will publish GridTools C++ on pypi.org to make it easier to consume GridTools C++ from GT4Py.
Bug fixes
- Fix CUDA 12.0 compilation (#1741)
- Improvements to Python packaging (#1742, #1743, #1744)
- Fix get_keys of empty hymap (#1728)
- fn: CUDA early exit on empty grid - an empty domain skips execution instead of erroring (#1729)
- fn: prefer qualified names over ADL for fn builtins (they are not customization points for the user) (#1731, #1732)
- Enable workarounds for CUDA 11.8 (#1734)
- Enable workarounds for Clang 15 (#1735)
- Update pybind11 version to fix wrong C++ standard (#1723)
- Fix perfect forwarding in sid::composite::make_values (#1722)
- Workaround for NVCC bug in gcl (present in 11.6, 11.7 and most likely in 11.8) (#1726)
Performance fixes
- Alternative skip value check in fn, which improves CUDA performance (#1721)
Build fixes
- Fix perftests CMake target when no tests are added (#1724)
Cleanup
- Replace boost::variant by std::variant (#1718)
CI
- Add gcc-11, gcc-12, CUDA 12.0 (#1738, #1739, #1740)
Contributions
This release contains contributions from @DropD, @egparedes, @fthaler, @havogt, @petiaccja
- C++
Published by havogt about 3 years ago
gridtools - GridTools version 2.0.1
Bug fixes
- Fix: storage_gpu for HIPCC-AMDGPU (#1540)
- Performance fix for C++17 (#1618)
- Enable several CUDA workarounds for recent compilers (#1681 and others)
- Some declarations to definitions
- Workaround gtest incompatibilities in recent compilers
CI
- Compile (and run) all tests on GitHub actions (Jenkins doesn't run on v2.0.x anymore)
- C++
Published by havogt about 3 years ago
gridtools - GridTools version 2.2.3
Bug fixes
- Fix CUDA 12.0 compilation (#1741)
- Improvements to Python packaging (#1742, #1743, #1744)
CI
- Add gcc-11, gcc-12, CUDA 12.0 (#1738, #1739, #1740)
- C++
Published by havogt about 3 years ago
gridtools - GridTools version 2.2.2
fn: SID neighbor table wrapper (#1730)
Adds a simple class that wraps a SID and implements the neighbour table concept. (Picked for convenience into 2.2.2.)
Support for Python packaging (#1720)
Starting with this release we will publish GridTools C++ on pypi.org to make it easier to consume GridTools C++ from GT4Py.
Bug fixes
- Fix get_keys of empty hymap (#1728)
- fn: CUDA early exit on empty grid - an empty domain skips execution instead of erroring (#1729)
- fn: prefer qualified names over ADL for fn builtins (they are not customization points for the user) (#1731, #1732)
- Enable workarounds for CUDA 11.8 (#1734)
- Enable workarounds for Clang 15 (#1735)
Build fixes
- Fix perftests CMake target when no tests are added (#1724)
- C++
Published by havogt over 3 years ago
gridtools - GridTools version 2.2.1
Bug fixes
- Update pybind11 version to fix wrong C++ standard (#1723)
- Fix perfect forwarding in sid::composite::make_values (#1722)
- Workaround for NVCC bug in gcl (present in 11.6, 11.7 and most likely in 11.8) (#1726)
Performance fixes
- Alternative skip value check in fn, which improves CUDA performance (#1721)
Cleanup
- Replace boost::variant by std::variant (#1718)
- C++
Published by havogt almost 4 years ago
gridtools - GridTools version 2.2.0
C++ standard upgraded to C++17
Starting with this version of GridTools, we require the C++17 standard (#1680) and improved the code base using C++17 features (#1693, #1716, #1697):
- Get rid of tuple_util::make
- GT_CONSTEXPR and GT_CONSTEXPR_TARGET goes away
- wstd stuff goes away
- is_trivially_copy_constructible check is consistently used instead of is_trivially_copyable where the data is passed host/device boundary, because it is exactly what is needed.
- make_[smth] pattern is replaced to template argument deduction in several places, the old pattern is deprecated
- composite is rewritten using c++17
- overload is rewritten using c++17
- std::[smth]_v<...> are used instead of std::[smth]<...>::value
- static_assert(<cond>) used instead of static_assert(<cond>, "")
- CTAD for simple_ptr_holder (#1701, #1708)
If you were using functionality from the internal library common you might have to update your code (all common is considered internal API, see Release process). The most common change is using CTAD instead of makers where possible. If not possible due to compiler bugs, the maker pattern was updated to be independant of tuple_util::make. E.g. replace
tuple_util::make<tuple>(...)bytuple(...)tuple_util::make<array>(...)byarray(...)sid::composite::make<...>(...)bysid::composite::keys<...>::make_values(...)tuple_util::make<hymap::keys<...>::values>(...)byhymap::keys<...>::make_values(...);
New library fn: functional model backend
The fn library provides functionality for the Declarative GT4Py to implement a backend for the functional model. It supports (naive, no-blocking) CPU and (efficient) GPU (CUDA) execution for structured (Cartesian) and unstructured grids. See examples in tests/regression/fn/.
The library provides a high-level, human-readable frontend, but is mainly meant as a target for code generators.
- Introduce functional model backend (#1648, #1666, #1679)
- Implements fn::extents (#1683)
- Column Stage (#1685)
- New Backend Backends (#1695)
- Fn Frontend (#1698)
- Performance References for FN Backends (#1711)
- Add fn::tupleget and fn::maketuple (#1713)
- Allow setting CUDA stream (#1712)
Minor new features
- int_vector library (#1672)
- add conversion assign to hymap (#1702)
Minor improvements
- Extensions to meta and hymap (#1663)
- Soften sid value type requirements from
std::trivially_copyabletostd::trivially_copy_constructible(#1663) is_tuple_like(#1676) andis_hymap(#1677)
Bug fixes
- Propagate CXX_STANDARD to all tests (#1664)
- Compilation fixes for nvcc 11.5 and clang 12 with std=c++20 (#1665)
- Workaround for nvcc bug https://godbolt.org/z/orrev1xnM (#1681)
- c_bindings example: fix typo and split cpu and gpu fortran sources (#1684)
- Fix unused param warnings (#1706)
- Fix Compilation with CUDA 11.6 (#1710)
- Support for Clang 14 (#1707)
Testing
- Add C++20 with Cray Clang on Piz Daint to Jenkins CI (#1675)
- Perftest Updates (#1690)
- CI dom: Downgrade to gcc 10.3 for CUDA toolkit support (#1699)
Contributions
This release contains contributions from @anstaf, @fthaler, @havogt.
- C++
Published by havogt almost 4 years ago
gridtools - GridTools version 2.1.0
New features
- Dump backend: outputs a json representation of the stencil specification (#1456)
- Reduction library with naive, CPU and GPU backends (#1590, #1594, #1619)
- SID: Python cuda array interface support (#1596)
Extended features
- Support for compile time length in data stores (#1545)
- Several SID improvements (#1548)
- Structured bindings support for gridtools tuple-like (#1556)
- Improvements for Hugepage Allocation (#1562)
- Add protection against misuse of device namespace (#1581)
- fortranarrayview: allow to disable openacc (#1603)
- Introduce sid::unknown_kind (#1605)
Non-functional changes
- Hold the sids within sid::composite as tuple (#1564)
- Various cleanups and c++17 related changes (#1579)
- C++17 versions of meta::fold (#1549)
- Sid as a proper C++20 concept (#1580, #1582)
Performance
- More Inlining in cpu_kfirst Backend (#1634)
- Support for Compile-Time Unit Stride Dimension for Python SID Adapter (#1635, #1651)
Bug fixes
- K-cache fixes (#1530)
- CMake: Fix storage_gpu for HIPCC-AMDGPU (#1540)
- Remove a warning in hugepage_alloc which warns about a problem which only affects testing code (#1560)
- Improve HIP + OpenMP Compilation (#1578)
- Fix empty composite and add composite::make helper (#1583)
- Fix asconst to work with any SID and be compatible with std::asconst (#1601, #1611)
- SID composite: add static_assert against incorrect kinds (#1604)
- Workaround a CUDA problem: tuple_util::concat remove constexpr var (#1606)
- Improve Compliance with Parallel Model: Limit fusion of k-parallel execution with k-offsets (#1612)
- GCC 9.x: Optimize multishift (#1630)
- Python SID adapter: fix integer format check (#1632)
- GCC 11.x: Compilation fixes (#1641, #1646)
- Fixes for CUDA 11.4 (#1644)
Testing
- Update to GTest v1.11 and minor changes to adapt for changed gtest interface (#1655)
Documentation
- Clarifications to the execution model (#1541)
Contributions
This release contains contributions from @anstaf, @fthaler, @havogt, @lukasm91.
- C++
Published by havogt over 4 years ago
gridtools - GridTools version 1.1.4
Bug fixes
- speedup compile time (#1608)
- Support for GPU backend with custom block sizes in boundary conditions (#1438)
- Fix sid shift origin (#1517)
Compatibility with new compilers
- Added support for GCC 11.x (#1652, #1654)
- Fix for CUDA 11 (#1520)
- C++
Published by havogt over 4 years ago
gridtools - GridTools version 2.0.0
GridTools v2.0.0
GridTools v2.0.0 comes with an improved API for stencil composition and storage construction. These changes and a few others (see below) are breaking changes.
Changes since v1.1.0
New API: Stencil Composition
The make_computation API for composing stencils is replaced by a new stencil specification API, e.g.
```cpp auto horizontaldiffusionspec = { GTDECLARETMP(double, lap, flx, fly); return st::executeparallel() .ijcached(lap, flx, fly) .stage(lapfunction(), lap, in) .stage(flxfunction(), flx, in, lap) .stage(flyfunction(), fly, in, lap) .stage(outfunction(), out, in, flx, fly, coeff); };
st::run(horizontaldiffusionspec, stencilbackendt(), grid, coeff, in, out); ```
instead of
```cpp
auto horizontaldiffusion = gt::makecomputation
horizontaldiffusion.run(pin{} = in, p_out{} = out); ```
See the documentation and examples for details about the new API.
Related PRs: #1388
New API: Storage Builder
Datastores are now created using a builder API, e.g.
```cpp auto storagebuilder = gt::storage::builder<storagetraits_t>.dimensions(d1, d2, d3).halos(halo, halo, 0);
auto in = storagebuilder.type
The type returned by the builder is a shared_ptr of a datastore (previously the `sharedptrwas inside thedata_store`)
Other storage related changes:
- Memory alignment is applied in bytes (instead of in elements).
- Host/device buffers are automatically synchronized on creation of views or on access of the underlying pointer (the sync method is removed).
See the documentation and examples for details about the new API.
Related PRs #1388, #1534
API break: New Backend names
Our backend names (cuda, mc, x86) where a source of confusion as the users had a certain (but wrong) idea of e.g. when to use x86.
The new names are (#1490):
- gpu instead of cuda as the same backend works for HIP.
- cpu_kfirst instead of x86, the innermost dimension is k, suitable for vertical stencils and architectures that emphasize caches over vector instructions.
- cpu_ifirst instead of mc, the innermost dimension is i, suitable for modern CPUs where vector instructions are key for performance.
Additionally we introduced a new backend gpu_horizontal (#1445) which works only for pure horizontal (parallel) stencils.
Performance of gpu_horizontal is improved over gpu for most stencils, however we recommend to benchmark both backends.
Other API breaking changes
- Backend declarations (traits) are removed from
common/defs.hppand are now provided in component specific headers forstencil,timer,gclandstorage(#1388). - We improved the code structure by introducing finer-grained namespaces (#1388)
- The storage repository was removed (#1456)
New functionality
- New
sid::rename_dimensions(#1533) - New regression test illustrating c-arrays as SIDs (#1525)
- A Python SID adapter including regression test for calling computations from Python (#1523)
- Introduced the threadpool concept (#1484, #1498, #1504) and added an HPX threadpool (#1437)
- Added an example for calling CUDA GridTools computations from Fortran with OpenACC (#1454)
Improved functionality
- GCL is now header-only (-> all GridTools is now header-only)
- The CMake build scripts are rewritten, see the documentation and examples for how to use GridTools CMake targets (#1421, #1441, #1442, #1450, #1509)
Bug Fixes / Cleanup
- Fixes to SID concept helpers (#1524, #1527, #1531)
- Fixes for CUDA 11 (#1529), thanks @lukasm91
- Fixes for HIP compilation (#1488)
- Better error diagnostics at the frontend (#1495)
- Performance tests are now included in a single binary (#1453)
- Layout transformations are refactored (#1388)
- and many other small fixes
Infrastructure/Development
- Environments are renamed to describe more precisely what they are (#1507)
- Added testing on the new MeteoSwiss machine Tsa to Jenkins (#1452)
- Moved tests from Travis to GitHub actions (#1446), added tests for different CMake setups (#1443).
- Added a Gitpod configuration (#1423)
- Added testing with Clang-based Cray compiler on Daint (#1382)
Contributions
This release contains contributions from @anstaf, @fthaler, @havogt, @jdahm, @lukasm91, @mbianco, @tehrengruber, @wdeconinck.
- C++
Published by havogt almost 6 years ago
gridtools - GridTools version 2.0.0rc2
see final release
- C++
Published by havogt almost 6 years ago
gridtools - GridTools version 2.0.0rc1
see final release
- C++
Published by havogt almost 6 years ago
gridtools - GridTools version 1.1.3
Performance fixes
- Revert a #pragma unroll to be optimal for the COSMO dycore on V100 (#1400)
Other
- CMake: Add a missing policy workaround_mpi.cmake (#1398)
- C++
Published by havogt over 6 years ago
gridtools - GridTools version 1.1.2
Support for new targets
- Support for clang-CUDA and HIP (#1361)
Fixes
- Support custom block size in storage traits (#1392)
- Add
GT_FUNCTIONtostorage_info - CMake: export compilation type (#1387)
Infrastructure
- Update testing environment after Piz Daint upgrade (squash of #1369, #1371, #1373, #1382)
- C++
Published by havogt over 6 years ago
gridtools - GridTools version 1.0.4
Fixes
- CMake: support for superbuilds (nesting gridtools with
add_subdirectory/FetchContent) #1383
- C++
Published by havogt over 6 years ago
gridtools - GridTools version 1.1.1
Fixes
- Make computation API thread compatible by making the allocator thread_local (#1380).
- CMake: fix to make GridTools work as nested project in a "superbuild" setup.
- C++
Published by havogt over 6 years ago
gridtools - GridTools version 1.0.3
Fixes
- Fix a module in communication #1356
- CMake: fix storage module #1353
- C++
Published by havogt over 6 years ago
gridtools - GridTools version 1.1.0
GridTools
In GridTools v1.1.0 we set the default C++ standard to C++14 and drop compatibility for C++11. This requires at least CUDA 9.0.
Changes since v1.0.0
Full introduction of the SID concept
The backend is completely restructured based on the SID (stencil iteratable data) concept. There should be no user facing changes as long as user code was only using documented public API (*). The changes separate backend implementation from the core library to allow non intrusive extension of the library with new backends. Additionally maintainability of the gridtools infrastructure is significantly improved. Performance should be improved in general, but might be worse for specific computations. A common pattern for performance improvement/degradation is not observed.
(*) There is one change which might trigger different behavior (though the old behavior was not documented): temporary fields are now implicitly 3 dimensional. Prior to this version the user could have abused a 2D temporary field for accumulating values between k-levels.
New
- New example illustrating the type-erasure pattern for computations. #1318
Deprecation (support will be removed in GridTools v2.0.0)
- Using the gridtools::cbindings is deprecated. Switch to the standalone https://github.com/GridTools/cppbindgen.
global_accessoris deprecated, usein_accessor(without extents) instead.make_global_parameterwithbackendas template parameter is deprecated. Thebackendis not needed anymore.
Fixes / Cleanup
- Fix performance for CUDA 9.2 / 10.0 #1281 #1327 #1339
- Use c++14 features. #1307
- Use multiple threads in storage Initialization. #1300
- Remove dependency on boost::mpl and boost::fusion
- Fixes required to compile gridtools with HIP-Clang. Full support for AMD GPUs via HIP-Clang will come in a next release. #1363
- Fix a bug in communication #1355.
- The
global_parameterdoesn't require pre-allocated storage (as it is now passed via constant memory in case of CUDA), thereforeglobal_parameteris a lightweight wrapper around the value type, which can be created without overhead, e.g. when passing it tocomputation.run().
Infrastructure/Development
- The bash build script is replaced by a python driven build process, see wiki for how to get the environment. #1273 #1298 #1341
- Improved jenkins performance plots. #1301 #1338
- Googletest is now pulled-in with CMake's FetchConent instead of having it as part of the repository. #1310
- C++
Published by havogt over 6 years ago
gridtools - GridTools version 1.0.2
Fixes
- The workaround implemented in v1.0.1 did not fully recover CUDA 8.0 performance for CUDA >= 9.2. A further workaround now recovers performance. See #1326.
- Make
GT_DEFAULT_VERTICAL_BLOCK_SIZEmacro modifiable for the user. See #1350.
- C++
Published by havogt almost 7 years ago
gridtools - GridTools version 1.0.1
Fixes
- Workaround for performance regression in CUDA 9.2 and newer, see #1223.
- C++
Published by havogt almost 7 years ago
gridtools - GridTools version 1.0.0
GridTools
An introduction to GridTools can be found in the documentation. Functionality as described in the documentation is considered public API, other functionality is considered internal and might change without notice.
Upgrading from pre-release versions
In the process of finalizing GridTools v1.0, API was changed in many places in the past pre-release version. See the description of the releases for information on how to update to the latest API.
Changes since v0.21.0
API breaking changes
The backend strategy (naive/block) was removed and replaced by a separate naive backend. (#1238, #1240, #1244)
In the process, the target tags became obsolete as they were just referring to a backend. Therefore target was renamed to backend.
To update apply the following changes
- backend_t::make_global_parameter(…) is now make_global_parameter<backend_t>(). Same for update_global_parameter.
- backend_t::storage_traits_t was removed, use storage_traits<backend_t> instead.
- target::X is now called backend::X.
- CMake variables GT_ENABLE_TARGET_X are renamed to GT_ENABLE_BACKEND_X.
Other
- Already deprecated functions were removed (#1232)
- Removed 2D and packing version from gcl (#1233)
- The last 2 parameters of axis are encapsulated in types, and the order of these parameters is reversed, e.g. use
axis<2, axis_config::offset_limit<4>>instead ofaxis<2,0,4>(#1257) - The call operator is removed from the global parameter (#1256)
- C++
Published by havogt about 7 years ago
gridtools - Preparation for public release
Changes since 0.20.0
API breaking changes
- Conditionals (
if_,switch_) are removed. - Rename all files and folders with
-(dash) to_(underscore). - Rename
reactivate_device_write_views()toreactivate_target_write_views() - Removed the multiple-kernel implementation for boundary conditions.
Examples
- Examples are now provided with standalone CMakeLists.txt. The examples are used as a test for the GridTools CMake installation in our regression tests.
- C-bindings example was added.
Performance improvements
- mc: changed loop order and added omp statement for boundary conditions
Bug fixes
- Restores x86 performance, which was broken in 0.20.0.
- Restores cuda performance for layout transformations, which was broken in 0.20.0.
- Enable a workaround for CUDA 10.1 which already existed for CUDA < 10.1.
- CMake: export the mpi workaround
- CMake: fix a path for gt_bindings.cmake
- C++
Published by havogt about 7 years ago
gridtools - API changes in preparation for the public release
Changes since 0.19.0
API breaking changes
Naming changes
A lot of public GridTools functions, types and macros were renamed to consistently use lower-case
arg_list->param_listas the elements are the parameters of the stencil operator (not the arguments).Do-method →apply-methodenumtype::inandenumtype::inout->intent::in,intent::inoutexecute<enumtype::forward>etc. ->execute::forwardaccess_mode::ReadOnly,ReadWrite→access_mode::read_only,read_writecache_type::IJ, K→cache_type::ij, kdirection::I, J, K→direction::i, j, kownership::ExternalCPU, ExternalGPU→ownership::external_cpu, external_gpuSTRUCTURED_GRIDS→GT_STRUCTURED_GRIDSFLOAT_PRECISION→GT_FLOAT_PRECISIONBACKEND_*→GT_BACKEND_*ENABLE_METERS→GT_ENABLE_METERSstorage_info_interface->storage_info
Removed
- Removed
axis<...>::with_offset_limit,axis<...>::with_extra_offsetsas they were confusing. These options have to be set directly as template arguments to theaxis.
Internal API changes
GRIDTOOLS_STATIC_ASSERT→GT_STATIC_ASSERTASSERT_OR_THROW→GT_ASSERT_OR_THROWDISALLOW_COPY_AND_ASSIGN→GT_DISALLOW_COPY_AND_ASSIGN_USE_GPU_→GT_USE_GPUGTREPO_*→GT_REPO_*GRIDTOOLS_PP_*→GT_PP_*PEDANTIC→GT_PEDANTICVERBOSE→GT_VERBOSERESTRICT→GT_RESTRICT__DISABLE_CACHING__→GT_DISABLE_CACHINGMETA_STORAGE_INDEX_LIMIT→GT_META_STORAGE_INDEX_LIMIT- Removed
ALLOW_EMPTY_EXTENTS,_USE_DATATYPES_ - Added
GT_-prefix to some file-local macros to minimize conflict probability. _GCL_GPU_→GCL_GPU_GCL_MPI_→GCL_MPICUDAMSG→GCL_CUDAMSG_GCL_CHECK_DESTRUCTOR→GCL_CHECK_DESTRUCTORHOSTWORKAROUND→GCL_HOSTWORKAROUNDNULL→nullptr- Added
GCL_-prefix to GCL macros. - Replaced GCL header guards by
#pragma once
Other API changes
- Structured grids is now the default
- Users should use
make_param_listto create theparam_listinstead of explicitly usingboost::mpl::vector. In the future usingboost::mpl::vectormight not work anymore, the underlying type is implementation detail, not public API cache_typeis now an enum class. Update code by prefixing allijandkwithcache_type::- Introduces
make_expandable_computation(expand_factor<N>, ...)and removes the respective overload ofmake_computation; andmake_expandable_positional_computation(expand_factor<N>, ...)and removes the respective overload ofmake_positional_computation
New functionality
- Distributed boundaries: timers for pack/unpack, exchange, and boundary condition.
New example
- Tridiagonal solver
Bug fixes
- Fix CUDA type unsigned long long char, which was a copy and paste bug from the CUDA programming guide where they are missing a comma.
- Add
!=to halo_descriptors (==already existed). - fortranarrayadapter: Throw if datastore was not allocated.
- c_bindings: wrap line for procedures.
- repository: bindings support to add a prefix.
- In CUDA temporaries are only allocated if they are not cached.
- User-friendly error on missing backend in make_computation.
- User-friendly error argument type check of make_multistage.
- Added Back checkgridagainst_extents
- communication: only exchange the part of the buffer which is actually used by the exchange (not the full allocated buffer)
- Workaround nvcc which has problems in unrolling a loop in hypercube_iterator.
- Fix to the pointer sharing constructor of storage_info.
Other changes
- Documentation was updated
Internal changes
- Added
hymapwhich is a boost::fusion-like map. - Updates to sid
- C++
Published by havogt about 7 years ago
gridtools - New versioning scheme
Starting with this release we introduce a new versioning scheme.
Changes since 1.08.02 (which would have been 0.18.2 in the new versioning scheme).
New versioning scheme
Version number: X.Y.Z
- X: Major version will be 0 until the public release, then it will be 1, probably until a new major feature, e.g. complete icgrid.
- Y: Minor version will be increased after every API change and new smaller features, probably very often.
- Z: Patch version will be increased for bug fixes.
The CMake version matching is changed in this release to COMPATIBILITY SameMinorVersion which means the following: Let's say the user requires find_package(GridTools 0.18.2). Then 0.18.3 (a newer patch release) will be compatible; 0.18.1 (an older than requested release) and 0.19.0 (a newer minor release) will be rejected.
API breaking changes
Removes reduction support from the stencil-composition API
- make_reduction is removed
- computation type erasure doesn't have ReturnType as a first template argument, i.e. computation<void, args...> needs to be replaced by computation<args...>.
- run method of computation returns void now.
New functionality
Possibility to query intent and extent for placeholders from computation
- computation.get_arg_intent(my_arg()) returns enumtype::intent
- computation.get_arg_extent(my_arg()) returns rt_extent which contains extents in i,j,k directions
Performance improvements
- several unneeded
cudaDeviceSynchronize()in boundary_conditions are removed
Bug fixes
- c_bindings: support for multiple template arguments in generic bindings macro
Internal changes
- added convenience library for integral constants with
__host__ __device__conversion and construction with custom literal_c - SID utilities
- C++
Published by havogt over 7 years ago