Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: support CXI in OFI for Slingshot-11 #3791

Merged
merged 19 commits into from
Apr 22, 2024

Conversation

ericjbohm
Copy link
Contributor

This supports the CXI interface for Cassini (AKA Slingshot-11) within the OFI machine layer. No application level changes are required, but a variety of command line options are provided to configure the memory pool and the selection of cxi interfaces.

Note: this does require the use of the memory pool in order to efficiently support the FI_MR_ENDPOINT mode of memory registration. As the time cost of registering individual messages would be otherwise be entirely too high.

Charmrun has been configured to wrap srun and currently assumes PMI2 with cray extensions for launching.

The build system has been set up to autodetect CXI and enable support for it accordingly. For compatibility purposes, it also supports the use of cxi on the build line, but that should not be necessary on most HPE systems with proper LMOD environments.

This supports the CXI interface for Cassini (AKA Slingshot-11)
within the OFI machine layer.  No application level changes
are required, but a variety of command line options are
provided to configure the memory pool and the selection
of cxi interfaces.

Note: this does require the use of the memory pool in order to
efficiently support the FI_MR_ENDPOINT mode of memory registration.
As the time cost of registering individual messages would be
otherwise be entirely too high.

Charmrun has been configured to wrap srun and currently
assumes PMI2 with cray extensions for launching.

The build system has been set up to autodetect CXI and enable
support for it accordingly.  For compatibility purposes, it also
supports the use of cxi on the build line, but that should not be
necessary on most HPE systems with proper LMOD environments.
src/arch/ofi/machine-onesided.C Outdated Show resolved Hide resolved
src/arch/ofi/machine.C Show resolved Hide resolved
src/arch/ofi/machine.C Outdated Show resolved Hide resolved
@ericjbohm ericjbohm linked an issue Mar 26, 2024 that may be closed by this pull request
ericjbohm and others added 2 commits March 27, 2024 10:06
this allows xpmem to be supported in the build process for nonsmp
targets.  This has not yet proven beneficial for performance.
@ericjbohm ericjbohm requested review from bhatele and trquinn March 28, 2024 18:43
@ericjbohm ericjbohm requested a review from stwhite91 April 2, 2024 17:12
Copy link
Collaborator

@jcphill jcphill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but obviously too big to be sure it's correct. Couple of small comments.

src/arch/ofi/machine.C Outdated Show resolved Hide resolved
src/arch/ofi/machine.C Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
src/arch/ofi/machine.C Outdated Show resolved Hide resolved
Copy link
Collaborator

@stwhite91 stwhite91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any documentation updates needed?

Has non-CXI OFI been tested and benchmarked for performance with the changes to use the LRTS mempool and the ofi request cache?

src/arch/ofi/conv-common.h Show resolved Hide resolved
@ericjbohm
Copy link
Contributor Author

ericjbohm commented Apr 12, 2024 via email

Copy link
Contributor

@adityapb adityapb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ericjbohm ericjbohm requested a review from ritvikrao April 17, 2024 15:23
Copy link
Contributor

@ritvikrao ritvikrao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finally got a chance to look at this. everything looks all set to merge

@ericjbohm ericjbohm added this pull request to the merge queue Apr 22, 2024
Merged via the queue into main with commit 100d563 Apr 22, 2024
23 checks passed
Comment on lines +371 to +378
# assume HPC installation
include(CMakePrintHelpers)
find_package(EnvModules REQUIRED)
find_package(PkgConfig REQUIRED)
if(EnvModules_FOUND)
#at least get libfabric loaded if it isn't already
env_module(load libfabric)
endif()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this assumption is too restrictive. It does not seem to work when building Charm++ with Spack, since libfabric is provided as a Spack package and not through modules.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense for the standard case of building on a DOE machine, but could present a problem for Spack. Is there is a simple way to test for the Spack case and then extract the necessary information to accomplish the same smooth build this code accomplishes with LMOD and PkgConfig?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the simplest way is to setup Charm++ as a develop build and introduce changes to the build system and Spack package until it works. I started working on it but got stuck and then other stuff took priority.

@stwhite91 stwhite91 deleted the ericjbohm/add_cxi_support_to_ofi_for_slingshot11 branch July 25, 2024 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cray shasta ofi libfabric build support on perlmutter
8 participants