Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds consumer metadata checking to next requests #5141

Closed
wants to merge 1 commit into from

Conversation

jnmoyne
Copy link
Contributor

@jnmoyne jnmoyne commented Feb 28, 2024

Adds a metadata field to consumer next requests
When processing the request the server will compare the request's metadata (if any) against the consumer's metadata (in the config) and reject the request with a status 409 if the two do not match (either a value is different or a key in the request is missing in the consumer).

When processing the request the server will compare the request's metadata (if any) against the consumer's metadata (in the config) and reject the request with a status 409 if the two do not match (either a value is different or a key in the request is missing in the consumer).

Signed-off-by: Jean-Noël Moyne <jnmoyne@gmail.com>
@jnmoyne jnmoyne requested a review from a team as a code owner February 28, 2024 03:22
Copy link
Member

@Jarema Jarema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to explain why you would like this PR in. What is the use case?

Sending metadata to the server in Pull Request as a validation if we sent it to proper consumer is really counter intuitive. Why would sending a metadata in Pull Request check if it's the same on the server?

My guess is it's (a bit hacky :)) way to make sure if the consumer is the same, meaning - it was not recreated under the same name?
This goes back to the fact that we did not separate ID of the consumer from the Name. It would be useful to have such a feature, as it would allow more flexibility with renaming streams and consumers.

Let's discuss how to best solve the use case you have :).

@@ -3332,6 +3332,26 @@ func (o *consumer) processNextMsgRequest(reply string, msg []byte) {
return
}

if len(consumerMetaData) > 0 && len(o.cfg.Metadata) > 0 {
var matching = true
for k, v := range consumerMetaData {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a one way check.
If consumer metadata on the server has key-pairs that are not present on the metadata passed in request, the check will still return true, a false positive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a false positive, that's the behavior I want:
I pass you a collection of metadata key/values and I want to make sure those are present in the consumer's metadata and they match the value. It's not a 'deep equal' but a 'check that those keys and values are there and the same'.

}
}
if !matching {
sendErr(409, "Request's medata does not match the consumer's")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should to be the same error code.

I would really like us to have separate code's at least for errors that have a very distinct way to react to them in the client.

Also this should be a const, not a literal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did the same as what the existing other 409 errors do (e.g. line 3308) 🤷‍♂️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typos and incorrect 's here

Copy link
Member

@Jarema Jarema Mar 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be 409.
Client libraries should not compare strings to determine what is happening and what to do.

EDIT: I know we did it all over the place already, but it would be really nice to improve the situation, at least for the new errors...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use http status codes here, and 4xx are client errors, with 409 being conflict which seems correct.

@jnmoyne
Copy link
Contributor Author

jnmoyne commented Feb 28, 2024

My particular use case is checking that some part of the consumer's metadata (that I expect to be there and with the values that I expect) did not get changed between two pull requests. In my case I use this to get an 'exclusive puller from a consumer' behavior. Meaning that each client instance picks a unique id and sets a key in the consumer config's metadata with that id and uses consumer create to make sure it is the only one creating the consumer (which has a durable name shared by all the instances). If you can create the consumer with your id in the metadata you want to make sure that the same id is there each time you do a consumer next because in some failure scenarios (like a network failure) the consumer could be deleted because of timeout and another instance (therefore different id in the metadata) may have been able to create the consumer for itself.

I would suspect that there may be other use cases where you want to check that something in the configuration (i.e. part of the metadata) did not get changed between 2 pull requests.

I otherwise would have to do a get consumer info each time before I do a fetch (and wouldn't be able to use 'consume()' or 'messages()', I'd have to call fetch() myself.

That said, ultimately my desired functionality is exclusive puller from a durable consumer, and there may be reasons to implement it another way which I would gladly use, this just seemed like a simple change that could maybe have other use cases of wanting to know that 'something has changed in the consumer metadata'

@derekcollison
Copy link
Member

I am ok with this actually now that I understand what is being accomplished and that it will be used in only very specific use cases.

Interested in what @ripienaar thinks.

Copy link
Contributor

@ripienaar ripienaar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally ok with it too, I am just thinking how it can be abused and impact on payloads etc if this becomes heavily used.

And this solution will work only for pull consumers? Do we have a matching plan for push?

}
}
if !matching {
sendErr(409, "Request's medata does not match the consumer's")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typos and incorrect 's here

@Jarema
Copy link
Member

Jarema commented Feb 29, 2024

I have a different idea @derekcollison @ripienaar

We add a field "unique: bool" to Pull Request, false by default.
When server see it, it will check if there are Pull Requests for other inboxes. If they are, it will act exactly as here - return an error.

I think that way would be cleaner, and it would actually provide a check for exclusivity, rather than a convention that different payload means something changed.

This could be also done in reverse, so exclusiveness set on consumer, and flag in pull request to potentially take over the consumer, draining others, which could simplify the orchestration.

@ripienaar
Copy link
Contributor

@Jarema Hmm, we do not really have guarantees of inboxes being unique and it changes per pull doesnt it? Well it can it does not need to but maybe it does, so perhaps that not always legit?

I agree we need a less hacky approach though so lets think of alternatives of do a look around at how we use inboxes for pulls

@Jarema
Copy link
Member

Jarema commented Mar 1, 2024

@ripienaar I was wondering about it too.

We do reuse the same inbox while processing the messages in the same Consumer instance in the clients, with optional one wildcard token at the and, but the subject still would match.
And yes, it is not guaranteed to have unique inboxes except by convention.
Though to be fair, if we reuse inbox across clients, that would mess up messages delivery anyway (for pull consumers), so we assume it is unique (and we use random inbox() for all pull consumers).

@ripienaar
Copy link
Contributor

For sure we can’t have 2 clients with same inbox that would be bad (though I can tell you why I do that in places).

I am more concerned that using an inbox as a signal of uniqueness within one client might not be valid. Like is it that the last token changes between pulls and all the rest is same (new inbox style) or old style where it’s always difference etc. how’s the consumer to know.

@Jarema
Copy link
Member

Jarema commented Mar 1, 2024

We do not use muxing and request/reply for this, as it is multiple resonses for single request.
All clients rely on subscribing to inbox, but some add (for easier counting of messages per pull request) a token at the end so the sub looks like:

_INBOX.NUID.*

So we would have to do subject match, not equal match.

The biggest problem with this approach is that it would not work with old Fetch, that creates new inbox per Pull Request. Or at least might now work as expected. Would need to revisit the code.

Maybe a variation would be to add to the Pull Request a UID instead of bool flag @ripienaar ?
That way we would not rely on subject uniquness.
Would work almost exactly the same as @jnmoyne PR, but without abusing metadata for that purpose.

@ripienaar
Copy link
Contributor

Yes I like the UID approach, you can calculate it once and just reuse it in that client, good idea.

@Jarema Jarema mentioned this pull request Mar 1, 2024
@Jarema
Copy link
Member

Jarema commented Mar 1, 2024

Created the draft PR with the owner ID approach.

@derekcollison
Copy link
Member

derekcollison commented Mar 1, 2024

I think this is a different problem set vs what you are suggesting which is an appId that could allow for exclusive consumers.

@Jarema
Copy link
Member

Jarema commented Mar 1, 2024

Hm, how is this different?

I think this is exactly what @jnmoyne tries to achieve here. quoting him:

In my case I use this to get an 'exclusive puller from a consumer' behavior. Meaning that each client instance picks a unique id and sets a key in the consumer config's metadata with that id and uses consumer create to make sure it is the only one creating the consumer

use case
I actually see disadvage in using metadata in so opinionated way - it means we cannot use it for anything else in pull consumers.

It also ensures exclusiveness purely by convention: different clients can set different metadata and each will "think" it's exclusive.

performance
This solution allocates (and compares) a whole map for every pull request sent.

5157 PR formalizes that unique ID into clean and explicit way, such that can be also easily incorporated into clients to allow exclusive consumers for any other needs without to align metadata across clients.

@aricart
Copy link
Member

aricart commented Mar 1, 2024

shouldn't this be somehow tied to the connection? - if we have the server ID and the CID couldn't that be the identifier? - What I saying is the server knows the client making the request (server ID and CID) - I think this is available outside of operator mode, so if the consumer was of type exclusive - the server could aggregate the name provided for the consumer with this other info and effectively get an unique guy

@jnmoyne
Copy link
Contributor Author

jnmoyne commented Mar 1, 2024

Basically boils down to whether there are other use cases that could benefit from this "check some part of the metadata with each pull request" functionality besides the exclusive puller use case (I will admit, I can't think of a specific example one so far). Because otherwise @Jarema's other PR is basically the same thing except using a new string in consumer config and a new optional string field in the pull request, is more 'locked down' in that you can only use it for exclusivity and there's no option for people to 'abuse' it or getting it wrong if they don't all agree on the metadata key names.

In that more locked down specific to just exclusivity version, then you could indeed use the server name + cid as your unique id which means then it would not need to be added as a field to the pull request message (if the server can tell which server name + cid that request came from).

@jnmoyne
Copy link
Contributor Author

jnmoyne commented Mar 1, 2024

Forgot to answer: it not meant to be for push consumers, but I don't need push and as I'm using the new API anyways that's a non issue.

@jnmoyne
Copy link
Contributor Author

jnmoyne commented Mar 1, 2024

Another comment upon more thinking about it for my use case and I think I would rather be able to store an id that I generate myself as part of the consumer config (either in metadata or a new field) rather than have the exclusivity just be a boolean flag and using server+cid to generate the id automatically. For example because then anyone can tell using consumer info which id is the currently active one and each can tell whether it's its own id (though you could make the current id be part of the consumer state and expose some convenience function to compose the server+cid correctly for oneself. And in my use-case I'd rather identify at the process rather than the current connection level (i.e. same id even after reconnecting).

@ripienaar
Copy link
Contributor

I like the flexibility that the metadata approach bring: its very undefined, open ended and one can maybe solve other problems with the it in the future, a nice little hacky tool.

But is that what we really need at this stage? We can do it but should we? I am not sure we are at a place where we should be addeding open ended, undefined and unrestricted things like this - especially not that might be used on every pull request.

It begs so many questions, how many fields are allowed? How big are they? What limits are there? What schema do they follow?

I think its dangerous assigning core behaviours to entirely open ended undefined things like metadata.

So my vote is a ID based approach and if we feel visibility of the ID is needed then we add it to consumer state. But we need to be quite careful to not open the can of worms of Raft state again.

@derekcollison
Copy link
Member

@Jarema What I meant by different is that @jnmoyne is using this so that he can make sure the consumer has not been changed out from underneath of the application.

What you and I are suggesting is more for exclusive consumers, in my version tied to an appID which allows it to be horizontally scalable. The consumer would be configured to do exclusive consumers, meaning once it bound to an AppID in a pull request, it would only deliver messages to that AppID until they were all gone then it would "switch" to a new AppID.

@jnmoyne
Copy link
Contributor Author

jnmoyne commented Mar 3, 2024

I'm totally fine with going with @Jarema 's version of just having a new exclusive id in the config instead of using metadata. I think we all agree that using the metadata for that is interesting and maybe a bit hacky and could be useful in some other use cases but could also open the door for "abuse" and since for my needs adding a new exclusive id to the config works just as well and is less open to abuse than this version I propose that we close this PR and move to @Jarema 's PR instead.

In any case the two are actually not incompatible: one if we do come up with or get a request for a use case were this way of monitoring changes to the config would be good to have then we can always resuscitate this.

@jnmoyne
Copy link
Contributor Author

jnmoyne commented Mar 25, 2024

Closing this as replaced by #5157

@jnmoyne jnmoyne closed this Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants