-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELEASE] rmm v25.04 #1876
Merged
Merged
[RELEASE] rmm v25.04 #1876
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Forward-merge branch-25.02 into branch-25.04
Branch 25.04 merge branch 25.02
Contributes to rapidsai/build-planning#146 Proposes: * setting `[tool.scikit-build].ninja.make-fallback = false`, so `scikit-build-core` will not silently fallback to using GNU Make if `ninja` is not available Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1804
Forward-merge branch-25.02 to branch-25.04
This migrates amd64 CI jobs (PRs and nightlies) to use L4 GPUs from the NVKS cluster. xref: rapidsai/build-infra#184 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Gil Forsyth (https://github.com/gforsyth) URL: #1803
xref rapidsai/build-planning#147 Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1811
Fixes `build_type` input not being used in `test` workflows. See #1811 (comment).
## Description Testing rapidsai/shared-workflows#276. We will merge this PR and then we can try running manual branch tests. ## Checklist - [x] I am familiar with the [Contributing Guidelines](https://github.com/rapidsai/rmm/blob/HEAD/CONTRIBUTING.md). - [x] New or existing tests cover these changes. - [x] The documentation is up to date with these changes.
Uses a retry wrapper for `pip` commands to try to alleviate CI failures due to hash mismatches that result from network hiccups xref rapidsai/build-planning#148 This will retry failures that show up in CI like: ``` Collecting nvidia-cublas-cu12 (from libraft-cu12==25.2.*,>=0.0.0a0) Downloading https://pypi.nvidia.com/nvidia-cublas-cu12/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_aarch64.whl (604.9 MB) ━━━━━━━━━━━━━━━━━━━━━ 350.2/604.9 MB 229.2 MB/s eta 0:00:02 ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them. nvidia-cublas-cu12 from https://pypi.nvidia.com/nvidia-cublas-cu12/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_aarch64.whl#sha256=93a4e0e386cc7f6e56c822531396de8170ed17068a1e18f987574895044cd8c3 (from libraft-cu12==25.2.*,>=0.0.0a0): Expected sha256 93a4e0e386cc7f6e56c822531396de8170ed17068a1e18f987574895044cd8c3 Got 849c88d155cb4b4a3fdfebff9270fb367c58370b4243a2bdbcb1b9e7e940b7be ``` Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1814
This completes the migration to NVKS runners now that all libraries have been tested and rapidsai/shared-workflows#273 has been merged. xref: rapidsai/build-infra#184 Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) URL: #1816
This change helps completely insulate rmm (and transitively) the rest of RAPIDS from fmt and spdlog as dependencies, thereby solving a large number of issues around ABI stability, symbol visibility, package clobbering, and more. See rapidsai/build-planning#104 for more information. Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Matthew Murray (https://github.com/Matt711) - Bradley Dice (https://github.com/bdice) - James Lamb (https://github.com/jameslamb) URL: #1808
Addresses #1808 (comment) Authors: - Vyas Ramasubramani (https://github.com/vyasr) Approvers: - Bradley Dice (https://github.com/bdice) - James Lamb (https://github.com/jameslamb) URL: #1820
A pair of doxygen comments in `host_memory_resource` referenced `device_memory_resource` when it didn't mean to, very likely a simple copy/paste issue. #1794 Authors: - Nicholas Sielicki (https://github.com/aws-nslick) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1809
This is a cleanup PR. I found that we were extraneously including `<thrust/optional.h>` in the pool memory resource (also `thrust::optional` is deprecated in favor of `cuda::std::optional` in the upcoming major release of CCCL). I did a pass with IWYU to see what else could be fixed. IWYU could only really analyze our tests, since RMM is header-only. There are a lot of false positives/negatives, so I don't think it is appropriate to automate IWYU in our CI. However, this felt valuable enough to open a refactoring PR. I also updated some deprecated GTest code which was using `TYPED_TEST_CASE` instead of `TYPED_TEST_SUITE` and replaced some uses of `::value` with the corresponding `_v` STL features. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #1821
Issue: rapidsai/build-planning#22 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #1824
Forward-merge branch-25.02 into branch-25.04
Summary: ## `recipe.yaml` - We use the [multi-output cache](https://rattler.build/latest/multiple_output_cache/) to avoid double-compiling. The `build` environment compiles things, the individual outputs call `cmake --install` - We make use of the built-in `git` functions for grabbing the short-SHA (https://rattler.build/latest/experimental_features/#git-functions) - We use `load_from_file` to pull in metadata from the corresponding `pyproject.toml` (https://rattler.build/latest/experimental_features/#load_from_filefile_path) - Relatively "simple" `*_build.sh` scripts are inlined into `recipe.yaml` instead of existing as separate files ## `build_*.sh` - We use `--no-build-id` to allow `sccache` to look in a predictable place, see: https://rattler.build/latest/tips_and_tricks/#using-sccache-or-ccache-with-rattler-build - Depending on whether `rapids-is-release-build`, we include either `rapidsai` (release) or `rapidsai-nightly` (non-release) in the channel listing - Channels must be specified at the command-line - This uses https://github.com/rapidsai/gha-tools/blob/main/tools/rapids-rattler-channel-string to generate an array of channels - We remove the `build_cache` directory after building so it doesn't get packaged up with the other artifacts and uploaded to S3 xref: rapidsai/build-planning#47 Authors: - Gil Forsyth (https://github.com/gforsyth) Approvers: - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1796
Issue: rapidsai/build-planning#22 Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1828
Update CMake minimum required to 3.30.4 across all of RAPIDS Authors: - Robert Maynard (https://github.com/robertmaynard) Approvers: - Gil Forsyth (https://github.com/gforsyth) - Bradley Dice (https://github.com/bdice) URL: #1826
Removes the `.` from the `py_version` context variable and standardizes whitespace and section ordering Authors: - Gil Forsyth (https://github.com/gforsyth) - https://github.com/apps/pre-commit-ci - Bradley Dice (https://github.com/bdice) - James Lamb (https://github.com/jameslamb) Approvers: - James Lamb (https://github.com/jameslamb) - Bradley Dice (https://github.com/bdice) URL: #1832
Fixes redistribution of `rapids-logger` code which can cause clobbering. See #1833. After this change, the following paths should _not_ be in the `librmm` package: - `lib/librapids_logger.so` - `lib/cmake/rapids_logger/*` - `include/rapids_logger/*` Authors: - Bradley Dice (https://github.com/bdice) Approvers: - https://github.com/jakirkham - Gil Forsyth (https://github.com/gforsyth) URL: #1834
This pr uses new functionality added to shared-actions and shared-workflows to capture sccache hit rate information. To add this to other repos, we'll need to make the slight alteration here: `sccache --show-adv-stats | tee ../../telemetry-artifacts/sccache-stats.txt` That is, output the sccache stats to a file with a particular name in the telemetry-artifacts folder. Authors: - Mike Sarahan (https://github.com/msarahan) Approvers: - James Lamb (https://github.com/jameslamb) URL: #1830
Fixes for `build`/`host` dependencies in the rattler recipe for librmm. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Gil Forsyth (https://github.com/gforsyth) URL: #1835
RMM benchmarks should statically link Google Benchmark. We saw they were linking to `libbenchmark.so` while working with rattler-build: #1836 (comment) Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Gil Forsyth (https://github.com/gforsyth) - Kyle Edwards (https://github.com/KyleFromNVIDIA) URL: #1837
Turns on erroring for overlinking errors and fixes all of those errors. I've reduced the number of overdepending warnings, but `rapids-logger` seems to consistently cause an overdepending warning, so I haven't yet switched that to error mode. Authors: - Gil Forsyth (https://github.com/gforsyth) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1836
Telemetry is causing build workflows to fail. This adds `telemetry-setup` to the `build.yaml` workflow. Authors: - Bradley Dice (https://github.com/bdice) - Mike Sarahan (https://github.com/msarahan) Approvers: - Mike Sarahan (https://github.com/msarahan) URL: #1838
This is a skeleton for adding examples, requested in issue #1784. I plan to merge some minimal form of this, and then add a few examples that answer common questions about RMM, such as how to use specific memory resource adaptors or how to use RMM for managing multi-thread, multi-stream work. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Jake Awe (https://github.com/AyodeAwe) - Mark Harris (https://github.com/harrism) - Lawrence Mitchell (https://github.com/wence-) URL: #1800
Retry getting improved error throwing and logging, with bugs fixed and test added that repros the cudf failure. [Original PR](#1827) that was [reverted](#1843). The changes to the previously-approved PR that includes the fixes and test is [this commit](c8a8505). The [original while loop](https://github.com/rapidsai/rmm/blob/6e8539e42d51852faab5f9b330232168f9223eee/include/rmm/mr/device/pool_memory_resource.hpp#L253) has been restored with better error handling. Note that this changes the interface of the macros, one of which is called in cudf that will be changed [here](rapidsai/cudf#18108) after this goes in. Authors: - Paul Mattione (https://github.com/pmattione-nvidia) - Bradley Dice (https://github.com/bdice) Approvers: - Bradley Dice (https://github.com/bdice) URL: #1844
Fix for `-fdebug-prefix-map` breaking sccache (it contains the librmm build number). Workaround for prefix-dev/rattler-build#1458. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Gil Forsyth (https://github.com/gforsyth) URL: #1846
This PR adds tests for internal macros. Closes #1848. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #1847
This PR runs C++ examples in CI. Closes #1845. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - James Lamb (https://github.com/jameslamb) URL: #1850
Updates several `dependencies.yaml` entries to match the others in the file which allows the `update-version.sh` script to work correctly.
Recently PR ( #1844 ) changed how error messages were generated when pointing to a particular file and line number. In particular they changed from using the typical C-string (`const char*`), which is `\0` terminated, to a C++ `std::string` object, which is not `\0` terminated. This change in turn was picked up when RMM headers are used to compile libraries (like cuDF) including file paths in strings that are not `\0` terminated. Conda in turn would detect the paths in these error messages and attempt to fix them as part of the prefix replacement process. When Conda did the prefix replacement would add an additional `\0` terminating character to string. However as the strings are now `std::string` based which lack `\0` terminating characters the final string written out by Conda would be one byte longer. This could mean overwriting other text data in the library or writing outside the text block. This is known bug in Conda ( conda/conda-build#1674 ). Thus when cuDF started building with the aforementioned RMM change last week, the packages it created lacked had file paths in error messages lacking the `\0` terminating character. These in turn would be inadvertently corrupted by Conda when installing the packages in an environment. This led to a quite hairy bug detailed in issue ( rapidsai/cudf#18251 ). To correct this issue, we drop the `std::string` constructor that was added in the aforementioned PR. More specifically we adapted the following code from cuDF's [`CUDF_EXPECTS_3`]( https://github.com/rapidsai/cudf/blob/8041ac8e370b092229841508fdfd1efb88fef034/cpp/include/cudf/utilities/error.hpp#L186-L192 ) and [`CUDF_FAIL_2`]( https://github.com/rapidsai/cudf/blob/86eb82399f0e056731e2062dc95a4583c26e9af1/cpp/include/cudf/utilities/error.hpp#L225-L227 ), which still uses a C-style string. Also to address the need for runtime generation of some errors. We use `std::string` for only an initial snippet of the string and add other contents like the `__FILE__` after. This keeps the latter bits as C-style strings. Authors: - https://github.com/jakirkham Approvers: - Bradley Dice (https://github.com/bdice) - Paul Mattione (https://github.com/pmattione-nvidia) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1858
rmm nightlies are currently failing on CUDA 11.4 because CUDA 11 librmm-examples package is overconstrained.
Closes #1611. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Lawrence Mitchell (https://github.com/wence-) URL: #1864
…d. (#1852) Closes #1783. Authors: - Bradley Dice (https://github.com/bdice) Approvers: - Lawrence Mitchell (https://github.com/wence-) - James Lamb (https://github.com/jameslamb) - Mark Harris (https://github.com/harrism) URL: #1852
If the driver supports the flag, unconditionally set the async memory pool usage property to include a request to support HW decompression. - Closes #1849 Authors: - Lawrence Mitchell (https://github.com/wence-) - Vyas Ramasubramani (https://github.com/vyasr) - Bradley Dice (https://github.com/bdice) Approvers: - Rong Ou (https://github.com/rongou) - Bradley Dice (https://github.com/bdice) - Vyas Ramasubramani (https://github.com/vyasr) URL: #1854
…#1873) This reverts commit 7f0cead. - Closes #1872 Authors: - Lawrence Mitchell (https://github.com/wence-) Approvers: - Vyas Ramasubramani (https://github.com/vyasr) URL: #1873
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
❄️ Code freeze for
branch-25.04
and v25.04 releaseWhat does this mean?
Only critical/hotfix level issues should be merged into
branch-25.04
until release (merging of this PR).What is the purpose of this PR?
branch-25.04
intomain
for the release