Skip to content

Commit

Permalink
bugfix: improve CXI support for ALCF Aurora configuration
Browse files Browse the repository at this point in the history
  • Loading branch information
Eric Bohm committed Nov 12, 2024
1 parent 4ee1554 commit 76f3d85
Showing 1 changed file with 16 additions and 2 deletions.
18 changes: 16 additions & 2 deletions src/arch/ofi/machine.C
Original file line number Diff line number Diff line change
Expand Up @@ -696,6 +696,7 @@ void LrtsInit(int *argc, char ***argv, int *numNodes, int *myNodeID)
* should not be considered predictive of proximity. That
* relationship has to be detected by other means.
* 2. HWLOC doesn't have a hwloc_get_closest_nic because... NIC
* doesn't even rate an object type in their ontology, let
* alone get first class treatment. Given that PCI devices
Expand All @@ -714,7 +715,7 @@ void LrtsInit(int *argc, char ***argv, int *numNodes, int *myNodeID)
* do *not* have such convenient labeling as something special
* needs to happen to get their linuxfs utilities to inject
* that derived information into your topology object. As an
* interim solution we allow the user to map their cxi[0..3]
* interim solution we allow the user to map their cxi[0..7]
* selection using command line arguments.
* 2b. Likewise the 1:1 relationship we assume here between
Expand All @@ -741,6 +742,8 @@ void LrtsInit(int *argc, char ***argv, int *numNodes, int *myNodeID)
* CPU nodes. The user could easily be confused, so we can't
* rely on them telling us. This has to be determined at
* run time.
* 6. Aurora can apparently go up to cxi7.
*/

char *cximap=NULL;
Expand Down Expand Up @@ -814,7 +817,18 @@ void LrtsInit(int *argc, char ***argv, int *numNodes, int *myNodeID)
/// short hsnOrder[numcxi]={2,1,3,0};
if(numcxi==4)
{
short hsnOrder[4]= {1,3,0,2};
short hsnOrder[8]= {1,1,3,3,0,0,2,2};
if(myRank%quad>numcxi)
{
CmiPrintf("Error: myrank %d quad %d myrank/quad %n",myRank,quad, myRank/quad);
CmiAbort("cxi mapping failure");
}
myNet=hsnOrder[myRank%quad];
}
else if(numcxi==8)
{
// no idea if this is a good ordering
short hsnOrder[8]= {0,1,2,3,4,5,6,7};
if(myRank%quad>numcxi)
{
CmiPrintf("Error: myrank %d quad %d myrank/quad %n",myRank,quad, myRank/quad);
Expand Down

0 comments on commit 76f3d85

Please sign in to comment.