v2023.06.1
This release contains various smallish RAJA improvements.
Please download the RAJA-v2023.06.1.tar.gz file below. The others, generated by GitHub, may not work for you due to RAJA's dependencies on git submodules.
Notable changes include:
-
New features / API changes:
- Add compile time block size optimization for new reduction interface.
- Changed default stream usage in Workgroup constructs to use the stream associated with the default (camp) resource. Previously, RAJA used stream zero. Specifically, this change affects where memset memory is zeroed in the device memory pool and where we get device function pointers for WorkGroup.
-
Build changes/improvements:
- RAJA_ENABLE_OPENMP_TASK CMake option added to enable/disable algorithm options based on OpenMP task construct. Currently, this only applies to RAJA's OpenMP sort implementation. The default is 'Off'. The option allows users to choose a task implementation if they wish.
-
Bug fixes/improvements:
- Fix compilation of GPU occupancy calculator and use common types for HIP and CUDA backends in the occupancy calculator, kernel policies, and kernel launch helper routines.
- Fix direct cudaMalloc/hipMalloc calls and memory leaks.