Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-17271 pool: Fix handle_event -DER_NONEXISTs #16081

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

liw
Copy link
Contributor

@liw liw commented Mar 12, 2025

When handling the exclusion of multiple ranks,
pool_svc_update_map_internal aborts the whole request and returns -DER_NONEXIST if any of the ranks is absent in the pool map. This is correct for the dmg case, but problematic for the handle_event case, where ranks not in the pool map should simply be ignored. (See the Jira ticket for more.)

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

When handling the exclusion of multiple ranks,
pool_svc_update_map_internal aborts the whole request and returns
-DER_NONEXIST if any of the ranks is absent in the pool map. This is
correct for the dmg case, but problematic for the handle_event case,
where ranks not in the pool map should simply be ignored.

Signed-off-by: Li Wei <liwei@hpe.com>
Copy link

Ticket title is 'Aurora test cluster can't exclude a faulty engine from a pool: handle_event(): failed to exclude ranks: DER_NONEXIST(-1005): 'The specified entity does not exist''
Status is 'Open'
https://daosio.atlassian.net/browse/DAOS-17271

@liw liw marked this pull request as ready for review March 12, 2025 06:03
@liw liw requested review from a team as code owners March 12, 2025 06:03
@liw liw requested review from liuxuezhao and kccain March 12, 2025 06:04
@@ -7033,7 +7033,7 @@ pool_svc_update_map_internal(struct pool_svc *svc, unsigned int opc, bool exclud
inval_tgt_addrs);
if (rc != 0)
goto out_map;
if (inval_tgt_addrs->pta_number > 0) {
if (src == MUS_DMG && inval_tgt_addrs->pta_number > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If need to revise the patch otherwise: src and skip_rf_check arguments are not documented in the comment preceding the function code. Incidentally, this may be the first and only use of the src argument in this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

@daosbuild1
Copy link
Collaborator

Test stage Functional Hardware Medium Verbs Provider completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-16081/1/execution/node/1557/log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants