Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX deadlock in test_shutdown_with_sys_exit_at_pickle #438

Merged
merged 13 commits into from
Mar 4, 2025

Conversation

ogrisel
Copy link
Contributor

@ogrisel ogrisel commented Mar 4, 2025

Trying to reproduce #437 in the current master branch.

EDIT: reproduced at d833e8f and fixed by c4217e4.

Fixes #437.

@ogrisel ogrisel changed the title DEBUG DEBUG deadlock in test_shutdown_with_sys_exit_at_pickle Mar 4, 2025
@ogrisel
Copy link
Contributor Author

ogrisel commented Mar 4, 2025

I cannot reproduce the race condition on main.

@ogrisel
Copy link
Contributor Author

ogrisel commented Mar 4, 2025

So 17ebf79 managed to reproduce the problem (by deadlocking at a different place but the warning is still there) with a high probability.

The probability of triggering the deadlock goes down if we remove the loop in the test. We need to find a balance between catching race conditions with a high probability and test time.

@ogrisel
Copy link
Contributor Author

ogrisel commented Mar 4, 2025

5 iterations in both test_shutdown_with_pickle_error and test_shutdown_with_sys_exit_at_pickle is definitely enough to trigger it with high probability, either under Linux or Windows. macOS seems to be unaffected for some reason (and I could never reproduce locally on macOS either).

So let's try to see if the proposed change of #437 (comment) can help resolve this problem.

@ogrisel
Copy link
Contributor Author

ogrisel commented Mar 4, 2025

Victory! c4217e4 fixed the problem! 🥳

@ogrisel ogrisel changed the title DEBUG deadlock in test_shutdown_with_sys_exit_at_pickle FIX deadlock in test_shutdown_with_sys_exit_at_pickle Mar 4, 2025
@ogrisel ogrisel marked this pull request as ready for review March 4, 2025 15:26
@ogrisel
Copy link
Contributor Author

ogrisel commented Mar 4, 2025

Note: the tests are still running in less than 8 minutes despite the changes in this PR.

Copy link
Contributor

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, weird that we never got this one before!
Thanks for the reproducer and the fix, it makes a lot of sense.
LGTM

@tomMoral tomMoral merged commit 38340f6 into joblib:master Mar 4, 2025
12 checks passed
@ogrisel ogrisel deleted the debug-437 branch March 6, 2025 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random deadlock when running test_shutdown_with_sys_exit_at_pick on CI
2 participants