v0.12.0
This release contains new features, notable changes, and bug fixes. Please see the RAJA user guide for more information about items in this release.
Please download the RAJA-v0.12.0.tar.gz file below. The others will not work due to the way RAJA uses git submodules.
Notable changes include:
-
Notable repository change:
- The 'master' branch in the RAJA git repo has been renamed to 'main'.
-
New features:
- New RAJA "work group" capability added. This allows multiple GPU kernels to be executed via one kernel launch, greatly reducing the run time overhead of launching CUDA kernels.
- Dynamic plug-ins in RAJA, which enable the use of things like Kokkos Performance Profiline Tools to be used with RAJA (https://github.com/kokkos/kokkos-tools)
- Added ability to pass a resource object to RAJA::forall methods to enable asynchronous execution for CUDA and HIP back-ends.
- Added "Multi-view" that works like a regular view, except that it can wrap multiple arrays so their accesses can share index arithmetic.
- Introduced RAJA "Teams" concept as an experimental feature. This enables hierarchical parallelism and additional nested loop patterns beyond what RAJA::kernel supports. Please note that this is very much a work-in-progress and is not yet documented in the user guide.
- Added initial support for dynamic loop tiling.
- New OpenMP execution policies added to support static, dynamic, and guided scheduling.
- Added support for const iterators to be used with RAJA scans.
- Support for bitwise and and or reductions have been added.
- The RAJA::kernel interface has been expanded to allow only segment index arguments used in a lambda to be passed to the lambda. In previous versions of RAJA, every lambda invoked in a kernel had to accept an index argument for every segment in the segment tuple passed to RAJA::kernel execution templates, even if not all segment indices were used in a lambda. This release still allows that usage pattern. The new capability requires an additional template parameter to be passed to the RAJA::statement::Lambda type, which identify the segment indices that will be passed and in which order.
-
API Changes:
- The RAJA 'VarOps' namespace has been removed. All entities previously in that namespace are now in the 'RAJA' namespace.
- RAJA span is now public for users to access and has been made more like std::span.
- RAJA::statement::tile_fixed has been moved to RAJA::tile_fixed (namespace change).
- RAJA::statement::{Segs, Offsets, Params, ValuesT} have been moved to RAJA::{Segs, Offsets, Params, ValuesT} (namespace change).
- RAJA ListSegment constructors have been expanded to accept a camp Resource object. This enables run time specification of the memory space where the data for list segment indices will live. In earlier RAJA versions, the space in which list segment index data lived was a compile-time choice based on whether CUDA or HIP was enabled and the data resided in unified memory for either case. This is still supported in this release, but is marked as a DEPRECATED FEATURE. In the next RAJA release, ListSegment construction will require a camp Resource object. When compiling RAJA with your application, you will see deprecation warnings if you are using the deprecated ListSegment constructor.
- A reset method was added to OpenMP target offload reduction classes so they contain the same functionality as reductions for all other back-ends.
-
Build changes/improvements:
- The BLT, camp, CUB , and rocPRIM submodules have all been updated to more recent versions. Please note that RAJA now requires rocm versionc 3.5 or newer to use the HIP back-end.
- Build for clang9 on macox has been fixed.
- Build for Intel19 on Windows has been fixed.
- Host/device annotations have been added to reduction operations to eliminate compiler warnings for certain use cases.
- Several warnings generated by the MSVC compiler have been eliminated.
- A couple of PGI compiler warnings have been removed.
- CMake improvements to make it is easier to use an external camp or CUB library with RAJA.
- Note that the RAJA tests are undergoing a substantial overhaul. Users, who chose to build and run RAJA tests, should know that many tests are now being generated in the build space directory structure which mimics the RAJA source directory structure. As a result, only some test executables appear in the top-level 'test' subdirectory of the build directory; others can be found in lower-level directories. The reason for this change is to reduce test build times for certain compilers.
-
Bug fixes:
- An issue with SIMD privatization with the Intel compiler, required to generate correct code, has been fixed.
- An issue with the atomicExchange() operation for the RAJA HIP back-end has been fixed.
- A type issue in the RAJA::kernel implementation involving RAJA span usage has been fixed.
- Checks for iterator ranges and container sizes have been added to RAJA scans, which fixes an issue when users attempted to run a scan over a range of size zero.
- Several type errors in the Layout.hpp header file have been fixed.
- Several fixes have been made in the Layout and Static Layout types.
- Several fixes have been made to the OpenMP target offload back-end to address host-device memory issues.
- A variety of RAJA User Guide issues have been addressed, as well as issues in RAJA example codes.