v2024.02.2
This release contains a bugfix and new execution policies that improve performance for GPU kernels with reductions.
Please download the RAJA-v2024.02.2.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.
Notable changes include:
-
New features / API changes:
- RAJA::loop_exec and associated policies (loop_reduce, etc.) have been removed. These were deprecated in an earlier release and type aliased to RAJA::seq_exec, etc. which have the same behavior as RAJA::loop_exec, etc. in the past. When you update to this version of RAJA, please change use of loop_exec too seq_exec in your code.
- New GPU execution policies for CUDA and HIP added which provide improved performance for GPU kernels with reductions. Please see the RAJA User Guide for more information. Short summary:
- Option added to change max grid size in policies that use the occupancy calculator.
- Policies added to run with max occupancy, a fraction of of the max occupancy, and to run with a "concretizer" which allows a user to determine how to run based on what the occupancy calculator determines about a kernel.
- Additional options to tune kernels containing reductions, such as
- an option to initialize data on host for reductions that use atomic operations
- an option to avoid device scope memory fences
- Change ordering of SYCL thread index ordering in RAJA::launch to follow the SYCL "row-major" convention. Please see RAJA User Guide for more information.
-
Build changes/improvements:
- NONE.
-
Bug fixes/improvements:
- Fixed issue in bump-style allocator used internally in RAJA::launch.