[RELEASE] rmm v25.04 #1876

raydouglass · 2025-03-20T16:24:05Z

❄️ Code freeze for `branch-25.04` and v25.04 release

What does this mean?

Only critical/hotfix level issues should be merged into branch-25.04 until release (merging of this PR).

What is the purpose of this PR?

Update documentation
Allow testing for the new release
Enable a means to merge branch-25.04 into main for the release

Forward-merge branch-25.02 into branch-25.04

Branch 25.04 merge branch 25.02

Contributes to rapidsai/build-planning#146 Proposes: * setting `[tool.scikit-build].ninja.make-fallback = false`, so `scikit-build-core` will not silently fallback to using GNU Make if `ninja` is not available Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1804

Forward-merge branch-25.02 to branch-25.04

This migrates amd64 CI jobs (PRs and nightlies) to use L4 GPUs from the NVKS cluster. xref: rapidsai/build-infra#184 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Gil Forsyth (https://github.com/gforsyth) URL: #1803

xref rapidsai/build-planning#147 Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1811

Fixes `build_type` input not being used in `test` workflows. See #1811 (comment).

## Description Testing rapidsai/shared-workflows#276. We will merge this PR and then we can try running manual branch tests. ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.

Uses a retry wrapper for `pip` commands to try to alleviate CI failures due to hash mismatches that result from network hiccups xref rapidsai/build-planning#148 This will retry failures that show up in CI like: ``` Collecting nvidia-cublas-cu12 (from libraft-cu12==25.2.*,>=0.0.0a0) Downloading https://pypi.nvidia.com/nvidia-cublas-cu12/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_aarch64.whl (604.9 MB) ━━━━━━━━━━━━━━━━━━━━━ 350.2/604.9 MB 229.2 MB/s eta 0:00:02 ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them. nvidia-cublas-cu12 from https://pypi.nvidia.com/nvidia-cublas-cu12/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_aarch64.whl#sha256=93a4e0e386cc7f6e56c822531396de8170ed17068a1e18f987574895044cd8c3 (from libraft-cu12==25.2.*,>=0.0.0a0): Expected sha256 93a4e0e386cc7f6e56c822531396de8170ed17068a1e18f987574895044cd8c3 Got 849c88d155cb4b4a3fdfebff9270fb367c58370b4243a2bdbcb1b9e7e940b7be ``` Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1814

This completes the migration to NVKS runners now that all libraries have been tested and rapidsai/shared-workflows#273 has been merged. xref: rapidsai/build-infra#184 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) URL: #1816

This change helps completely insulate rmm (and transitively) the rest of RAPIDS from fmt and spdlog as dependencies, thereby solving a large number of issues around ABI stability, symbol visibility, package clobbering, and more. See rapidsai/build-planning#104 for more information. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Murray (https://github.com/Matt711) - Bradley Dice (https://github.com/bdice) - James Lamb (https://github.com/jameslamb) URL: #1808

Addresses #1808 (comment) Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - James Lamb (https://github.com/jameslamb) URL: #1820

A pair of doxygen comments in `host_memory_resource` referenced `device_memory_resource` when it didn't mean to, very likely a simple copy/paste issue. #1794 Authors: - Nicholas Sielicki (https://github.com/aws-nslick) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1809

This is a cleanup PR. I found that we were extraneously including `<thrust/optional.h>` in the pool memory resource (also `thrust::optional` is deprecated in favor of `cuda::std::optional` in the upcoming major release of CCCL). I did a pass with IWYU to see what else could be fixed. IWYU could only really analyze our tests, since RMM is header-only. There are a lot of false positives/negatives, so I don't think it is appropriate to automate IWYU in our CI. However, this felt valuable enough to open a refactoring PR. I also updated some deprecated GTest code which was using `TYPED_TEST_CASE` instead of `TYPED_TEST_SUITE` and replaced some uses of `::value` with the corresponding `_v` STL features. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #1821

Issue: rapidsai/build-planning#22 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #1824

Forward-merge branch-25.02 into branch-25.04

Summary: ## `recipe.yaml` - We use the [multi-output cache](https://rattler.build/latest/multiple_output_cache/) to avoid double-compiling. The `build` environment compiles things, the individual outputs call `cmake --install` - We make use of the built-in `git` functions for grabbing the short-SHA (https://rattler.build/latest/experimental_features/#git-functions) - We use `load_from_file` to pull in metadata from the corresponding `pyproject.toml` (https://rattler.build/latest/experimental_features/#load_from_filefile_path) - Relatively "simple" `*_build.sh` scripts are inlined into `recipe.yaml` instead of existing as separate files ## `build_*.sh` - We use `--no-build-id` to allow `sccache` to look in a predictable place, see: https://rattler.build/latest/tips_and_tricks/#using-sccache-or-ccache-with-rattler-build - Depending on whether `rapids-is-release-build`, we include either `rapidsai` (release) or `rapidsai-nightly` (non-release) in the channel listing - Channels must be specified at the command-line - This uses https://github.com/rapidsai/gha-tools/blob/main/tools/rapids-rattler-channel-string to generate an array of channels - We remove the `build_cache` directory after building so it doesn't get packaged up with the other artifacts and uploaded to S3 xref: rapidsai/build-planning#47 Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1796

Issue: rapidsai/build-planning#22 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1828

Update CMake minimum required to 3.30.4 across all of RAPIDS Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Gil Forsyth (https://github.com/gforsyth) - Bradley Dice (https://github.com/bdice) URL: #1826

Removes the `.` from the `py_version` context variable and standardizes whitespace and section ordering Authors: - Gil Forsyth (https://github.com/gforsyth) - https://github.com/apps/pre-commit-ci - Bradley Dice (https://github.com/bdice) - James Lamb (https://github.com/jameslamb) Approvers: - James Lamb (https://github.com/jameslamb) - Bradley Dice (https://github.com/bdice) URL: #1832

Fixes redistribution of `rapids-logger` code which can cause clobbering. See #1833. After this change, the following paths should _not_ be in the `librmm` package: - `lib/librapids_logger.so` - `lib/cmake/rapids_logger/*` - `include/rapids_logger/*` Authors: - Bradley Dice (https://github.com/bdice) Approvers: - https://github.com/jakirkham - Gil Forsyth (https://github.com/gforsyth) URL: #1834

This pr uses new functionality added to shared-actions and shared-workflows to capture sccache hit rate information. To add this to other repos, we'll need to make the slight alteration here: `sccache --show-adv-stats | tee ../../telemetry-artifacts/sccache-stats.txt` That is, output the sccache stats to a file with a particular name in the telemetry-artifacts folder. Authors: - Mike Sarahan (https://github.com/msarahan) Approvers: - James Lamb (https://github.com/jameslamb) URL: #1830

Fixes for `build`/`host` dependencies in the rattler recipe for librmm. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Gil Forsyth (https://github.com/gforsyth) URL: #1835

RMM benchmarks should statically link Google Benchmark. We saw they were linking to `libbenchmark.so` while working with rattler-build: #1836 (comment) Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Gil Forsyth (https://github.com/gforsyth) - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #1837

Turns on erroring for overlinking errors and fixes all of those errors. I've reduced the number of overdepending warnings, but `rapids-logger` seems to consistently cause an overdepending warning, so I haven't yet switched that to error mode. Authors: - Gil Forsyth (https://github.com/gforsyth) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1836

Telemetry is causing build workflows to fail. This adds `telemetry-setup` to the `build.yaml` workflow. Authors: - Bradley Dice (https://github.com/bdice) - Mike Sarahan (https://github.com/msarahan) Approvers: - Mike Sarahan (https://github.com/msarahan) URL: #1838

This is a skeleton for adding examples, requested in issue #1784. I plan to merge some minimal form of this, and then add a few examples that answer common questions about RMM, such as how to use specific memory resource adaptors or how to use RMM for managing multi-thread, multi-stream work. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Jake Awe (https://github.com/AyodeAwe) - Mark Harris (https://github.com/harrism) - Lawrence Mitchell (https://github.com/wence-) URL: #1800

Retry getting improved error throwing and logging, with bugs fixed and test added that repros the cudf failure. [Original PR](#1827) that was [reverted](#1843). The changes to the previously-approved PR that includes the fixes and test is [this commit](c8a8505). The [original while loop](https://github.com/rapidsai/rmm/blob/6e8539e42d51852faab5f9b330232168f9223eee/include/rmm/mr/device/pool_memory_resource.hpp#L253) has been restored with better error handling. Note that this changes the interface of the macros, one of which is called in cudf that will be changed [here](rapidsai/cudf#18108) after this goes in. Authors: - Paul Mattione (https://github.com/pmattione-nvidia) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1844

Fix for `-fdebug-prefix-map` breaking sccache (it contains the librmm build number). Workaround for prefix-dev/rattler-build#1458. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Gil Forsyth (https://github.com/gforsyth) URL: #1846

This PR adds tests for internal macros. Closes #1848. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #1847

This PR runs C++ examples in CI. Closes #1845. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) URL: #1850

Updates several `dependencies.yaml` entries to match the others in the file which allows the `update-version.sh` script to work correctly.

Recently PR ( #1844 ) changed how error messages were generated when pointing to a particular file and line number. In particular they changed from using the typical C-string (`const char*`), which is `\0` terminated, to a C++ `std::string` object, which is not `\0` terminated. This change in turn was picked up when RMM headers are used to compile libraries (like cuDF) including file paths in strings that are not `\0` terminated. Conda in turn would detect the paths in these error messages and attempt to fix them as part of the prefix replacement process. When Conda did the prefix replacement would add an additional `\0` terminating character to string. However as the strings are now `std::string` based which lack `\0` terminating characters the final string written out by Conda would be one byte longer. This could mean overwriting other text data in the library or writing outside the text block. This is known bug in Conda ( conda/conda-build#1674 ). Thus when cuDF started building with the aforementioned RMM change last week, the packages it created lacked had file paths in error messages lacking the `\0` terminating character. These in turn would be inadvertently corrupted by Conda when installing the packages in an environment. This led to a quite hairy bug detailed in issue ( rapidsai/cudf#18251 ). To correct this issue, we drop the `std::string` constructor that was added in the aforementioned PR. More specifically we adapted the following code from cuDF's [`CUDF_EXPECTS_3`]( https://github.com/rapidsai/cudf/blob/8041ac8e370b092229841508fdfd1efb88fef034/cpp/include/cudf/utilities/error.hpp#L186-L192 ) and [`CUDF_FAIL_2`]( https://github.com/rapidsai/cudf/blob/86eb82399f0e056731e2062dc95a4583c26e9af1/cpp/include/cudf/utilities/error.hpp#L225-L227 ), which still uses a C-style string. Also to address the need for runtime generation of some errors. We use `std::string` for only an initial snippet of the string and add other contents like the `__FILE__` after. This keeps the latter bits as C-style strings. Authors: - https://github.com/jakirkham Approvers: - Bradley Dice (https://github.com/bdice) - Paul Mattione (https://github.com/pmattione-nvidia) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1858

rmm nightlies are currently failing on CUDA 11.4 because CUDA 11 librmm-examples package is overconstrained.

Closes #1611. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #1864

…d. (#1852) Closes #1783. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Lawrence Mitchell (https://github.com/wence-) - James Lamb (https://github.com/jameslamb) - Mark Harris (https://github.com/harrism) URL: #1852

If the driver supports the flag, unconditionally set the async memory pool usage property to include a request to support HW decompression. - Closes #1849 Authors: - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) Approvers: - Rong Ou (https://github.com/rongou) - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1854

…#1873) This reverts commit 7f0cead. - Closes #1872 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #1873

copy-pr-bot · 2025-03-20T16:24:09Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

raydouglass and others added 30 commits January 23, 2025 15:03

DOC v25.04 Updates [skip ci]

c0bdd33

Merge pull request #1793 from rapidsai/branch-25.02

3caeb8a

Forward-merge branch-25.02 into branch-25.04

Merge branch 'branch-25.02' into branch-25.04-merge-branch-25.02

3bc0178

Merge pull request #1799 from vyasr/branch-25.04-merge-branch-25.02

b6f6dd0

Branch 25.04 merge branch 25.02

Merge branch 'branch-25.02' into branch-25.04-merge-25.02

1afdf44

Merge branch-25.02 into branch-25.04

f23ae31

Merge pull request #1806 from bdice/branch-25.04-merge-25.02

58fc846

Forward-merge branch-25.02 to branch-25.04

Migrate to NVKS for amd64 CI runners (#1803)

45a4446

This migrates amd64 CI jobs (PRs and nightlies) to use L4 GPUs from the NVKS cluster. xref: rapidsai/build-infra#184 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Gil Forsyth (https://github.com/gforsyth) URL: #1803

Add build_type to workflow inputs (#1811)

fe66a24

xref rapidsai/build-planning#147 Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1811

Use build_type input (#1812)

5dbb28b

Fixes `build_type` input not being used in `test` workflows. See #1811 (comment).

Remove unnecessary index (#1820)

ac8a99b

Addresses #1808 (comment) Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - James Lamb (https://github.com/jameslamb) URL: #1820

Create Conda CI test env in one step (#1824)

9f2e634

Issue: rapidsai/build-planning#22 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #1824

Merge pull request #1825 from rapidsai/branch-25.02

b2999d5

Forward-merge branch-25.02 into branch-25.04

Consolidate more Conda solves in CI (#1828)

0d6083b

Issue: rapidsai/build-planning#22 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1828

Require CMake 3.30.4 (#1826)

1032c10

Update CMake minimum required to 3.30.4 across all of RAPIDS Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Gil Forsyth (https://github.com/gforsyth) - Bradley Dice (https://github.com/bdice) URL: #1826

Fixes for rattler recipe (#1835)

88eaaaf

Fixes for `build`/`host` dependencies in the rattler recipe for librmm. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Gil Forsyth (https://github.com/gforsyth) URL: #1835

bdice and others added 12 commits March 1, 2025 01:07

Add tests for RMM internal macros. (#1847)

866a40e

This PR adds tests for internal macros. Closes #1848. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #1847

Run examples in CI (#1850)

d8b7dac

This PR runs C++ examples in CI. Closes #1845. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) URL: #1850

Fix dependencies.yaml for update-version.sh (#1859)

9432761

Updates several `dependencies.yaml` entries to match the others in the file which allows the `update-version.sh` script to work correctly.

Fix run export on cudatoolkit (#1862)

019228d

rmm nightlies are currently failing on CUDA 11.4 because CUDA 11 librmm-examples package is overconstrained.

Add async view memory resource bindings to Python. (#1864)

c66ded5

Closes #1611. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #1864

Revert "Set mempool hw_decompress flag if driver supports it (#1854)" (…

c6773f2

…#1873) This reverts commit 7f0cead. - Closes #1872 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #1873

raydouglass requested review from a team as code owners March 20, 2025 16:24

raydouglass requested a review from KyleFromNVIDIA March 20, 2025 16:24

github-project-automation bot added this to RMM Project Board Mar 20, 2025

raydouglass requested review from vyasr and bdice March 20, 2025 16:24

github-actions bot added CMake Python Related to RMM Python API conda cpp Pertains to C++ code ci labels Mar 20, 2025

Update Changelog [skip ci]

6c23695

AyodeAwe merged commit 35ca074 into main Apr 9, 2025
5 of 7 checks passed

github-project-automation bot moved this to Done in RMM Project Board Apr 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RELEASE] rmm v25.04 #1876

[RELEASE] rmm v25.04 #1876

raydouglass commented Mar 20, 2025

copy-pr-bot bot commented Mar 20, 2025

[RELEASE] rmm v25.04 #1876

[RELEASE] rmm v25.04 #1876

Conversation

raydouglass commented Mar 20, 2025

❄️ Code freeze for branch-25.04 and v25.04 release

What does this mean?

What is the purpose of this PR?

copy-pr-bot bot commented Mar 20, 2025

❄️ Code freeze for `branch-25.04` and v25.04 release