-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH enable faulthandler traceback reporting on worker crash by SIGSEV #419
ENH enable faulthandler traceback reporting on worker crash by SIGSEV #419
Conversation
The pypy build deadlocked on commit 9d868e7 before reaching the new |
The pypy deadlock is probably unrelated to this PR, but let's push on more time to make sure. |
I updated this PR and against the current |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this! Just a few comment for the form but otherwise LGTM.
A potential concern is with the file descriptor, but I guess this will not cause extra complexity and will work in most cases so I would say let's merge this one.
Indeed, let's keep that in mind in case the expected output does not show up in case of a reported crash. |
The b9f7f25 run failed with a timeout (deadlock?) in https://github.com/joblib/loky/actions/runs/13635452625/job/38112993126?pr=419#step:5:200 This in turned caused the I have the feeling that this problem is unrelated to this PR, but let me push a few empty commits to see if this is more likely to happen here than on |
This happened again on linux with Python 3.11 in the CI run for commit 925e522, so there might be something fishy with this PR. Let's not merge it while we do not understand the source of the deadlock. |
… variability (PyPy)
Co-authored-by: Thomas Moreau <thomas.moreau.2010@gmail.com>
Co-authored-by: Thomas Moreau <thomas.moreau.2010@gmail.com>
…ocks the test under Windows
f6d4a47
to
bc43cdd
Compare
Merging, thanks a lot for the PR, this will help debugging the next deadlocks! :) |
Automatically call
faulthandler.enable()
by default to dump extra debug info to crashed workers' stderr, typically on SIGSEV, SIGBUS and co.I also expanded the error message of
TerminatedWorkerError
to point the users to look at stderr for more details. This should help us debug stability problems on our CI but also assist our users and interact more productively with them when they report problems.