Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rdma: add separate bounce buffer freelist for data (eager) messages #614

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

rauteric
Copy link
Contributor

Separate out bounce buffer freelists into a smaller-sized freelist for
control messages and a larger size for data (eager) messages

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@rauteric rauteric requested a review from a team as a code owner September 19, 2024 22:32
@rauteric rauteric force-pushed the fl-separate-pr branch 2 times, most recently from 19af9ec to 512366e Compare September 19, 2024 23:09
@rauteric rauteric changed the title rdma: add separate bounce buffer for data (eager messages) rdma: add separate bounce buffer freelist for data (eager) messages Sep 19, 2024
include/nccl_ofi_rdma.h Outdated Show resolved Hide resolved
include/nccl_ofi_rdma.h Outdated Show resolved Hide resolved
include/nccl_ofi_rdma.h Outdated Show resolved Hide resolved
src/nccl_ofi_rdma.c Outdated Show resolved Hide resolved
@rauteric
Copy link
Contributor Author

rauteric commented Oct 3, 2024

The CI failure(s) are real:

scatter_perf: nccl_ofi_rdma.c:2100: alloc_bounce_req: Assertion `NCCL_OFI_IS_PTR_ALIGNED(&bounce_fl_item->bounce_msg, BOUNCE_BUFFER_ALIGNMENT)' failed.

Need to fix up some asserts.

@rauteric
Copy link
Contributor Author

rauteric commented Oct 3, 2024

Force-push to 3494556 fixes the CI failure by having alignment on both (eager) bounce buffers and ctrl recv buffers.

This requirement will be relaxed when we refactor the data structure in the related #600.

@rauteric
Copy link
Contributor Author

rauteric commented Oct 3, 2024

Rebased on master

@AmedeoSapio
Copy link

bot:aws:retest

1 similar comment
@AvivBenchorin
Copy link
Contributor

bot:aws:retest

include/nccl_ofi_param.h Outdated Show resolved Hide resolved
@rauteric
Copy link
Contributor Author

Rebase on master

rajachan
rajachan previously approved these changes Oct 15, 2024
include/nccl_ofi_param.h Outdated Show resolved Hide resolved
@rauteric rauteric force-pushed the fl-separate-pr branch 2 times, most recently from e5452be to 7f36f35 Compare October 15, 2024 17:47
@rauteric
Copy link
Contributor Author

Rebased on master

Copy link
Contributor

@bwbarrett bwbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right approach. Now would be a great time to clean up some code, but this only makes the code more confusing.

Bounce buffers for eager messages and control messages are different things. Have different structures with different names so that there's no reason to figure out what's what. I'm also not sure I understand the desire to move the freelists from the endpoint into the per-rail structures, given that most of the data is really operations that will be per-endpoint.

include/nccl_ofi_param.h Outdated Show resolved Hide resolved
@rauteric
Copy link
Contributor Author

Bounce buffers for eager messages and control messages are different things. Have different structures with different names so that there's no reason to figure out what's what.

They are different things, but there is a lot of code that is shared between them (e.g., all of the buffer pool management code, posting receives). We should rename these structures to something more generic (like receive buffers instead of bounce buffers [which I wanted to do in a followup patch]), but I think duplicating all the buffer management code is overkill.

I'm also not sure I understand the desire to move the freelists from the endpoint into the per-rail structures, given that most of the data is really operations that will be per-endpoint.

I'm confused... isn't having the freelists per-rail exactly what you were advocating in #614 (comment)?

I think it makes sense to have the freelists per-rail because posted recv buffer counts are maintained per-rail.

@bwbarrett
Copy link
Contributor

I think the problem with your patch is that it's way too much cognitive load to figrue out what a bounce buffer is. And the rest is just too confusing to review. Fix that.

@rauteric
Copy link
Contributor Author

rauteric commented Dec 4, 2024

Converting this to draft until I have cycles to work on feedback above.

@rauteric rauteric marked this pull request as draft December 4, 2024 18:40
@rauteric rauteric force-pushed the fl-separate-pr branch 3 times, most recently from ef38a5b to be82b8e Compare January 10, 2025 23:32
@rauteric
Copy link
Contributor Author

Force-push fixes the CI issue, and fixes a dormant plugin bug in the process.

Force-push is a rebase on #762.

src/nccl_ofi_rdma.c Outdated Show resolved Hide resolved
src/nccl_ofi_rdma.c Outdated Show resolved Hide resolved
src/nccl_ofi_rdma.c Outdated Show resolved Hide resolved
src/nccl_ofi_rdma.c Show resolved Hide resolved
src/nccl_ofi_rdma.c Outdated Show resolved Hide resolved
@rauteric rauteric force-pushed the fl-separate-pr branch 2 times, most recently from 2ad142f to 61c8ecf Compare January 11, 2025 00:50
@rauteric rauteric marked this pull request as ready for review January 11, 2025 00:51
@rauteric rauteric dismissed bwbarrett’s stale review January 11, 2025 00:51

Changes addressed

src/nccl_ofi_rdma.c Outdated Show resolved Hide resolved
src/nccl_ofi_rdma.c Outdated Show resolved Hide resolved
src/nccl_ofi_rdma.c Outdated Show resolved Hide resolved
src/nccl_ofi_rdma.c Outdated Show resolved Hide resolved
include/nccl_ofi_rdma.h Show resolved Hide resolved
src/nccl_ofi_rdma.c Outdated Show resolved Hide resolved
The close message was being sent on the data rails instead of the
control rail(s). This was a harmless bug today, but needs to be fixed
for future changes.

Also, consolidate some code between ctrl and close messages to avoid
such bugs in the future.

Signed-off-by: Eric Raut <eraut@amazon.com>
This will facilitate creating different request types for control and
data rail rx buffers.

Signed-off-by: Eric Raut <eraut@amazon.com>
These two requst types current share an underlying data structure.

Signed-off-by: Eric Raut <eraut@amazon.com>
For default eager max size, ctrl buffers are much smaller than eager
buffers.

Signed-off-by: Eric Raut <eraut@amazon.com>
@@ -1763,7 +1780,7 @@ static inline int process_err_completion(nccl_net_ofi_rdma_device_t *device,
err_entry.prov_errno,
fi_cq_strerror(cq, err_entry.prov_errno, err_entry.err_data, NULL, 0),
(long)err_entry.len, nccl_net_ofi_req_str(req));
if (req->type == NCCL_OFI_RDMA_RX_BUFF) {
if (req->type == NCCL_OFI_RDMA_CTRL_RX_BUFF || NCCL_OFI_RDMA_EAGER_RX_BUFF) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is treating the enum value like a boolean, which will end up being true. This should be (req->type == NCCL_OFI_RDMA_CTRL_RX_BUFF || req->type == NCCL_OFI_RDMA_EAGER_RX_BUFF)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants