Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: timeout issues for list.peerings #2042

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

EtienneBruines
Copy link

The previously used DefaultContextTimeout does not work well with the user-definable WaitTime, which can easily exceed this global timeout.

Fixes #2041

Comment on lines 102 to 110
// list peering is a blocking API, so making sure the ctx passed while calling it
// times out after the default wait time.
ctx, cancel := context.WithTimeout(context.Background(), DefaultContextTimeout)
defer cancel()
// The first result needs to be "immediate", in which case we use the default timeout settings.
// For all other requests, we use a 'blocking' query, therefore setting the appropriate timeout.
// We add one second to allow for the Consul client to finish sending the response before
// declaring it an exceeded deadline.
var ctx context.Context
if opts.WaitTime != 0 {
ctx, cancel := context.WithTimeout(context.Background(), opts.WaitTime + time.Second)
defer cancel()
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either the original or this patch work the way I'd expect to see with blocking queries, with the caveat that I work on Nomad and not Consul. But their blocking queries behavior should be similar.

When we make a blocking query, we wait for an API response to come back. If there is no change before the wait time expires, we get a non-error response back with the current value of the query, whether or not it has changed. If we look at how the Peerings.List API works, the context on this request does not change the wait parameter! It sets the request timeout from the client. So that means when we hit this context deadline, the server is probably still waiting to send any update and will be sad because we're exiting early.

So if you're making a blocking query, the only reason you want to set a context other than context.Background() here (as we did for consul_partitions.go), is if you want the client side to bail out early. As we just saw, waiting until the deadline isn't an error in our use case here. But if you use a context, we'll get a context deadline error and return the error to the caller, instead of saying "ok, no change, nothing to do". Which is the behavior we expect of consul-template.

Copy link
Author

@EtienneBruines EtienneBruines Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand your comment and the TODO-comments to the Peerings APIs correctly, it seems like we do not need any context at all, other than perhaps context.Background(). Not in this case at least. I like that - it simplifies the code.

Thank you for taking the time!

tgross
tgross previously approved these changes Mar 25, 2025
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

The previously used `DefaultContextTimeout` does not work well with the user-definable `WaitTime`, which can easily exceed this global timeout. By removing the use of a context deadline, we follow the pattern other dependencies (like consul_peerings) use.

This made `DefaultContextTimeout` unused and it was therefore removed.

Fixes hashicorp#2041
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API peerings timeout for blocking queries
3 participants