-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Retry Attempts are not available in client-stats
#8299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We don't have retry metrics yet. There is a proposal in progress https://github.com/grpc/proposal/pull/488/files. A metric label In tracing though we have a way to identify if an rpc attempt was a transparent retry or not. When span is created, an attribute with key |
Seems reasonable, have to check our implementation of tracing to see if we include this or not but thanks for the pointer.
Cool proposal, you guys are one step ahead :D. Everything in the proposal makes sense. Havent thought this fully so take this with a bit of a grain of salt but given the proposal above and the fact that the
That way consumers who have an implementation of the stats handler can implement similar metrics as well. |
@davinci26 The OpenTelemetry (OTel) plugin provides trace attributes for RPC attempts that help in tracking retries. As you can see in https://github.com/grpc/grpc-go/blob/master/stats/opentelemetry/trace.go#L50-L51 (in the populateSpan function for *stats.Begin), these include:
And yes, for retry-specific "metrics", there's a proposal in progress.
|
Does this include
is added to the grpc-go/stats/opentelemetry/trace.go Line 54 in 709023d
RPC for non-transparent retries.
100% agree, |
@davinci26 you are right about the problem with attemptInfo. Each attempt of rpc gets a new attempt info so its wrong to maintain the counter of previous rpc attempts within that. This is a bug. I think the right way to do this is to funnel the Once we do that, we don't have to make it separately available in stats.Begin. Although i guess its fine even if we add |
Retry attempts can be counted by storing data in the context in a per-call interceptor, and incrementing it in |
What is the issue with funneling cs.numRetries? We need to also make sure to exclude transparent retries https://github.com/grpc/proposal/blob/master/A72-open-telemetry-tracing.md#tracing-information. clientStream.numRetries already have what we want. |
Sorry, I missed the bit about excluding transparent retries. You can wait to increment when
Generally, adding things requires proof that it is either necessary or sufficiently convenient. So the question should be the other way around: what is the issue with doing it the way that is already possible? Also...we don't support hedging yet, but when/if we do, those likely wouldn't be included in the |
Let me code up the suggestion above so take this statement with 75% confidence but where my head is at is that this fairly complicated/non-ergonomic. I appreciate the aversion towards extending the Edit: I think where I am coming from is that I already got this wrong once and I couldnt think of a way to implement the retries hence my question and then we realized that even the |
The Our goal is that 99% of users should be able to use the OpenCensus or OpenTelemetry plugins, instead, and that those should be easy to configure and use. |
Make sense @dfawley, thanks for the feedback and the pointer! I still have my reservations:
I think the main advantage on my side is that this method is available right now with the current release, so I ended up implementing it. It works well so far. (issue is closed on my side, but I think it is used to track the bug on the |
@purnesh42H I'm looking into this. Please assign the issue to me. |
Clients can configure non transparent retries either via
XDS
or viaServiceConfig
however thestats
handler have no good way to retrieve if a particularRPC
method is a retry or not.This type of observability is quite useful to determine if
retry-storms
(due to poor configurations) cause incidents and/or to see ifretries
are actually effective.I thought of looking at the
retry metadata
header"grpc-previous-rpc-attempts"
but this one is added by thehttp2_client
as a header after theHandleRPC
method is called so that didnt yield any results.My first question is: Is this actually true or is there a way to get
o11y
on thenon-transparent
retries from the client?My second question is: If it is not available, the
numRetries
is available when we construct thestats.Begin
struct so if you are open to extending thestats.Begin
struct, we can add it there.Looking forward to your thoughts and also happy to implement the feature if it is not available
The text was updated successfully, but these errors were encountered: