Skip to content

debug-images deadlock #358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
onelson opened this issue Aug 20, 2021 · 9 comments · Fixed by #773
Closed

debug-images deadlock #358

onelson opened this issue Aug 20, 2021 · 9 comments · Fixed by #773
Labels
Bug Something isn't working

Comments

@onelson
Copy link

onelson commented Aug 20, 2021

RE: the deadlock - I built the repro as a docker container and found that the deadlock happens when I run the container on my Centos box seemingly all of our Centos 7 machines, but not on another fedora machine so I'm not sure if it points to the specific kernel or perhaps some part of the network configuration. At any rate, I'm glad this isn't widespread and I'll follow up with our IT dept to try and figure this out.

Originally posted by @onelson in #180 (comment)


Working with my IT dept, they were able to reproduce the issue on our Centos7 hosts having the following kernel:

  • 3.10.0-1062.1.2.el7.x86_64

The issue was not present on:

  • 5.4.142-1.el7.elrepo.x86_64 which is the current latest LTS, I'm told.

It's not clear which hosts we'll be able to freely update, so I'm hoping there might be something to look at on your end to sidestep whatever the issue might be.

@Swatinem
Copy link
Member

if this might be related to some kind of network config, what openssl version do these systems have? and can you reproduce the deadlock when compiling with another tls provider?

@onelson
Copy link
Author

onelson commented Aug 20, 2021

I built the repro program as a Docker container with openssl 1.1.0l so it would have been the same in each case.

We use native-tls for our reqwest work extensively (and sentry works fine as of v0.23.0) so long as we don't try to use tracing.

Still, I can cook up another docker image for them to test with, selecting a different tls impl.

@onelson
Copy link
Author

onelson commented Aug 20, 2021

I've added a Dockerfile to the repo (flattening our base layers and stripping out our internal stuff).

Switching to rustls continues to exhibit the deadlock:
LaikaStudios/actix-tracing-sentry-repro@826829e

@jrray
Copy link

jrray commented Mar 1, 2024

Hi @onelson! I notice in your example repro case you have enabled the debug-images feature. We're also using centos 7 and experiencing deadlocks, since updating to the latest sentry-rust release, anything since #545 where this feature got enabled by default.

The hang is caused by sending any event to sentry, via tracing or otherwise.

In our environment, running sentry::integrations::debug_images::debug_images(); (before initializing sentry) panics here:

thread 'main' panicked at /net/homedirs/jrray/.cargo/registry/src/gitlab.spimageworks.com-9db14e3de8474184/findshlibs-0.10.2/src/lib.rs:261:14:
attempt to add with overflow

When the debug-images feature is turned on, the first time an event is generated, this lazy init static is attempted to be initialized:

static DEBUG_META: Lazy<DebugMeta> = Lazy::new(|| DebugMeta {
images: crate::debug_images(),
..Default::default()
});

Inside the initialization code, the aforementioned function is run and panics. Then, sentry_panic::panic_handler kicks in and wants to send an event to sentry about the panic. This causes a recursive attempt to lazy initialize DEBUG_META again, resulting in a deadlock.

In our environment at least, it is not safe to enable the debug-images feature. But since this is now a default feature, it is difficult to avoid it getting inadvertently enabled.

@jrray
Copy link

jrray commented Mar 1, 2024

Someone has attempted to contribute a fix for this here though wrapping vs saturating is debatable. Unfortunately the PR has gone stale.

@lcian
Copy link
Member

lcian commented Mar 27, 2025

Hey @onelson @jrray thanks for the report.
I have noticed that CentOS7 has reached end of life, and 3.x of the kernel has also surpassed the LTS window.
Do you have any insights if this happens on more recent versions of CentOS/Linux?
If it only affects older unsupported versions I would just proceed with documenting this.

@bddap
Copy link

bddap commented Mar 27, 2025

I encountered similar behavior a while back. Things would hang. Had to remove this crate in the end.

Was running on debian 10 at the time.

@bddap
Copy link

bddap commented Mar 27, 2025

Could have sworn I was running a release build though. (release mode uses wrapping add instead of panic)

@lcian lcian changed the title Tracing support deadlock debug-images deadlock Mar 28, 2025
@linear linear bot added the Bug Something isn't working label Apr 9, 2025
@lcian
Copy link
Member

lcian commented Apr 28, 2025

The SDK will now panic upon initialization of the debug images integration, which at least avoids the deadlock.
Still, please report it here or in a new issue if you're still affected by this issue with findshlibs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants