-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds consumer metadata checking to next requests #5141
Conversation
When processing the request the server will compare the request's metadata (if any) against the consumer's metadata (in the config) and reject the request with a status 409 if the two do not match (either a value is different or a key in the request is missing in the consumer). Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to explain why you would like this PR in. What is the use case?
Sending metadata to the server in Pull Request as a validation if we sent it to proper consumer is really counter intuitive. Why would sending a metadata in Pull Request check if it's the same on the server?
My guess is it's (a bit hacky :)) way to make sure if the consumer is the same, meaning - it was not recreated under the same name?
This goes back to the fact that we did not separate ID of the consumer from the Name. It would be useful to have such a feature, as it would allow more flexibility with renaming streams and consumers.
Let's discuss how to best solve the use case you have :).
@@ -3332,6 +3332,26 @@ func (o *consumer) processNextMsgRequest(reply string, msg []byte) { | |||
return | |||
} | |||
|
|||
if len(consumerMetaData) > 0 && len(o.cfg.Metadata) > 0 { | |||
var matching = true | |||
for k, v := range consumerMetaData { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a one way check.
If consumer metadata on the server has key-pairs that are not present on the metadata passed in request, the check will still return true, a false positive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a false positive, that's the behavior I want:
I pass you a collection of metadata key/values and I want to make sure those are present in the consumer's metadata and they match the value. It's not a 'deep equal' but a 'check that those keys and values are there and the same'.
} | ||
} | ||
if !matching { | ||
sendErr(409, "Request's medata does not match the consumer's") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should to be the same error code.
I would really like us to have separate code's at least for errors that have a very distinct way to react to them in the client.
Also this should be a const, not a literal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did the same as what the existing other 409 errors do (e.g. line 3308) 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typos and incorrect '
s here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be 409.
Client libraries should not compare strings to determine what is happening and what to do.
EDIT: I know we did it all over the place already, but it would be really nice to improve the situation, at least for the new errors...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use http status codes here, and 4xx are client errors, with 409 being conflict which seems correct.
My particular use case is checking that some part of the consumer's metadata (that I expect to be there and with the values that I expect) did not get changed between two pull requests. In my case I use this to get an 'exclusive puller from a consumer' behavior. Meaning that each client instance picks a unique id and sets a key in the consumer config's metadata with that id and uses consumer create to make sure it is the only one creating the consumer (which has a durable name shared by all the instances). If you can create the consumer with your id in the metadata you want to make sure that the same id is there each time you do a consumer next because in some failure scenarios (like a network failure) the consumer could be deleted because of timeout and another instance (therefore different id in the metadata) may have been able to create the consumer for itself. I would suspect that there may be other use cases where you want to check that something in the configuration (i.e. part of the metadata) did not get changed between 2 pull requests. I otherwise would have to do a get consumer info each time before I do a fetch (and wouldn't be able to use 'consume()' or 'messages()', I'd have to call fetch() myself. That said, ultimately my desired functionality is exclusive puller from a durable consumer, and there may be reasons to implement it another way which I would gladly use, this just seemed like a simple change that could maybe have other use cases of wanting to know that 'something has changed in the consumer metadata' |
I am ok with this actually now that I understand what is being accomplished and that it will be used in only very specific use cases. Interested in what @ripienaar thinks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally ok with it too, I am just thinking how it can be abused and impact on payloads etc if this becomes heavily used.
And this solution will work only for pull consumers? Do we have a matching plan for push?
} | ||
} | ||
if !matching { | ||
sendErr(409, "Request's medata does not match the consumer's") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typos and incorrect '
s here
I have a different idea @derekcollison @ripienaar We add a field "unique: bool" to Pull Request, false by default. I think that way would be cleaner, and it would actually provide a check for exclusivity, rather than a convention that different payload means something changed. This could be also done in reverse, so exclusiveness set on consumer, and flag in pull request to potentially take over the consumer, draining others, which could simplify the orchestration. |
@Jarema Hmm, we do not really have guarantees of inboxes being unique and it changes per pull doesnt it? Well it can it does not need to but maybe it does, so perhaps that not always legit? I agree we need a less hacky approach though so lets think of alternatives of do a look around at how we use inboxes for pulls |
@ripienaar I was wondering about it too. We do reuse the same inbox while processing the messages in the same Consumer instance in the clients, with optional one wildcard token at the and, but the subject still would match. |
For sure we can’t have 2 clients with same inbox that would be bad (though I can tell you why I do that in places). I am more concerned that using an inbox as a signal of uniqueness within one client might not be valid. Like is it that the last token changes between pulls and all the rest is same (new inbox style) or old style where it’s always difference etc. how’s the consumer to know. |
We do not use muxing and request/reply for this, as it is multiple resonses for single request.
So we would have to do subject match, not equal match. The biggest problem with this approach is that it would not work with old Fetch, that creates new inbox per Pull Request. Or at least might now work as expected. Would need to revisit the code. Maybe a variation would be to add to the Pull Request a UID instead of bool flag @ripienaar ? |
Yes I like the UID approach, you can calculate it once and just reuse it in that client, good idea. |
Created the draft PR with the owner ID approach. |
I think this is a different problem set vs what you are suggesting which is an appId that could allow for exclusive consumers. |
Hm, how is this different? I think this is exactly what @jnmoyne tries to achieve here. quoting him:
use case It also ensures exclusiveness purely by convention: different clients can set different metadata and each will "think" it's exclusive. performance 5157 PR formalizes that unique ID into clean and explicit way, such that can be also easily incorporated into clients to allow exclusive consumers for any other needs without to align metadata across clients. |
shouldn't this be somehow tied to the connection? - if we have the server ID and the CID couldn't that be the identifier? - What I saying is the server knows the client making the request (server ID and CID) - I think this is available outside of operator mode, so if the consumer was of type |
Basically boils down to whether there are other use cases that could benefit from this "check some part of the metadata with each pull request" functionality besides the exclusive puller use case (I will admit, I can't think of a specific example one so far). Because otherwise @Jarema's other PR is basically the same thing except using a new string in consumer config and a new optional string field in the pull request, is more 'locked down' in that you can only use it for exclusivity and there's no option for people to 'abuse' it or getting it wrong if they don't all agree on the metadata key names. In that more locked down specific to just exclusivity version, then you could indeed use the server name + cid as your unique id which means then it would not need to be added as a field to the pull request message (if the server can tell which server name + cid that request came from). |
Forgot to answer: it not meant to be for push consumers, but I don't need push and as I'm using the new API anyways that's a non issue. |
Another comment upon more thinking about it for my use case and I think I would rather be able to store an id that I generate myself as part of the consumer config (either in metadata or a new field) rather than have the exclusivity just be a boolean flag and using server+cid to generate the id automatically. For example because then anyone can tell using consumer info which id is the currently active one and each can tell whether it's its own id (though you could make the current id be part of the consumer state and expose some convenience function to compose the server+cid correctly for oneself. And in my use-case I'd rather identify at the process rather than the current connection level (i.e. same id even after reconnecting). |
I like the flexibility that the metadata approach bring: its very undefined, open ended and one can maybe solve other problems with the it in the future, a nice little hacky tool. But is that what we really need at this stage? We can do it but should we? I am not sure we are at a place where we should be addeding open ended, undefined and unrestricted things like this - especially not that might be used on every pull request. It begs so many questions, how many fields are allowed? How big are they? What limits are there? What schema do they follow? I think its dangerous assigning core behaviours to entirely open ended undefined things like metadata. So my vote is a ID based approach and if we feel visibility of the ID is needed then we add it to consumer state. But we need to be quite careful to not open the can of worms of Raft state again. |
@Jarema What I meant by different is that @jnmoyne is using this so that he can make sure the consumer has not been changed out from underneath of the application. What you and I are suggesting is more for exclusive consumers, in my version tied to an appID which allows it to be horizontally scalable. The consumer would be configured to do exclusive consumers, meaning once it bound to an AppID in a pull request, it would only deliver messages to that AppID until they were all gone then it would "switch" to a new AppID. |
I'm totally fine with going with @Jarema 's version of just having a new exclusive id in the config instead of using metadata. I think we all agree that using the metadata for that is interesting and maybe a bit hacky and could be useful in some other use cases but could also open the door for "abuse" and since for my needs adding a new exclusive id to the config works just as well and is less open to abuse than this version I propose that we close this PR and move to @Jarema 's PR instead. In any case the two are actually not incompatible: one if we do come up with or get a request for a use case were this way of monitoring changes to the config would be good to have then we can always resuscitate this. |
Closing this as replaced by #5157 |
Adds a metadata field to consumer next requests
When processing the request the server will compare the request's metadata (if any) against the consumer's metadata (in the config) and reject the request with a status 409 if the two do not match (either a value is different or a key in the request is missing in the consumer).