Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector sources acknowledge discarded events #12217

Open
jszwedko opened this issue Apr 14, 2022 · 17 comments
Open

Vector sources acknowledge discarded events #12217

jszwedko opened this issue Apr 14, 2022 · 17 comments
Labels
domain: delivery Anything related to delivering events within Vector such as end-to-end acknowledgements type: bug A code related bug.

Comments

@jszwedko
Copy link
Member

When end-to-end acknowledgments are enabled, Vector sources (where possible) are configured to wait until events are successful processed by associated sinks before acknowledging to the client connected to the source.

However, if an event is discarded by the configured topology, the source acknowledges this event as successfully processed. This was the intended implementation, but it is becoming clearer that this is unexpected behavior. As a Vector user, I would not expect these events to be marked as acknowledged.

Vector can discard events early when:

  • A transform errors and drops the event (like remap when drop_on_err = true)
  • A component output is not used as an input to another component (e.g. an unconsumed route)

Instead, these should result in negative acknowledgements to the source.

Prompted by https://discord.com/channels/742820443487993987/746070591097798688/964167175923433522

Related: #12109

@jszwedko jszwedko added type: bug A code related bug. domain: delivery Anything related to delivering events within Vector such as end-to-end acknowledgements labels Apr 14, 2022
@jszwedko
Copy link
Member Author

We could fail to boot a Vector config that has an unconsumed output for a source with acks enabled.

@jszwedko
Copy link
Member Author

Another idea courtesy of @binarylogic: we force all possible component errors to be handled explicitly when ack's are enabled.

Expanding that some, I think it could look like: all components that can drop events would have a on_error: (reject|discard|route) config option that would let users configure the acking behavior when an error occurs. If:

  • reject it would reject the event and cause a negative ack in the source
  • discard it would be discarded and ack'd
  • route it would be routed to a error output

@jszwedko
Copy link
Member Author

cc/ @hhromic for any thoughts since your discord conversation prompted this discussion 🙂

@spencergilbert
Copy link
Contributor

reject it would reject the event and cause a negative ack in the source

What's a negative ack?

@jszwedko
Copy link
Member Author

reject it would reject the event and cause a negative ack in the source

What's a negative ack?

Ah, for HTTP, this would be something like returning a 400.

For sources like kafka there is no "negative ack" so it just wouldn't ack.

@jszwedko
Copy link
Member Author

An alternative UX could be to have the blackhole sink have an option for rejecting data (or a new sink) that would allow us just to always have an error named output that the user would be responsible for routing to a blackhole sink to discard or reject when end-to-end acks are enabled.

@hhromic
Copy link
Contributor

hhromic commented Apr 18, 2022

cc/ @hhromic for any thoughts since your discord conversation prompted this discussion 🙂

Hello all, first of all, thanks for considering the opinion of users. It is highly appreciated :)
Not every software package we use can be praised with the same :P

While checking the documentation again, I stumbled upon the actual place where acknowledgement is properly defined:

For a source that supports end-to-end acknowledgements and is connected to a sink that has the acknowledgements option enabled, this will cause it to wait for all connected sinks to either mark the event as delivered, or to persist the events to a disk buffer, if configured on the sink, before acknowledging receipt of the event. If a sink signals the event was rejected and the source can provide an error response (i.e. through an HTTP error code), then the source will provide an appropriate error to the sending agent.

More importantly, it also clearly states what happens with dropped events:

Some transforms will drop events as part of their normal operation. For example, the dedupe transform will drop events that duplicate another recent event in order to reduce data volume. When this happens, the event will be considered as having been delivered with no error.

Therefore, in terms of documentation/behaviour, Vector seems to just be operating as designed :). I wouldn't classify this issue as a bug but an enhancement or feature request instead.

I was previously reading this other documentation page, which seems not so complete as above, and thus raising my questions in Discord: https://vector.dev/docs/about/under-the-hood/architecture/end-to-end-acknowledgements/

That being said, I think good enhancements as proposed are:

  • Vector sources acknowledge discarded events #12217 (comment) : This makes a lot of sense because enabling acks should also imply the user wanting more strict/controlled topologies. On the other hand, it might become annoying for users that know what they are doing. Especially when debugging.
  • Vector sources acknowledge discarded events #12217 (comment) : I feel this suggestion can devolve into "configuration options creep". For the remap transform we already have reroute_dropped, drop_on_abort and drop_on_error. So I would instead simplify on managing the behaviour using these three. For the route transform we also now have the _unmatched output that can be routed. See next.
  • Vector sources acknowledge discarded events #12217 (comment) : I also think this is a good idea. The user would then need to explicitly define a sink that "rejects messages for ack", i.e. a "rejector" sink where the user can redirect all messages that should be rejected upon desired conditions. This can couple very well with the existing remap and route options (see above). More importantly, this would force the user to make more explicit/clear topologies.

So in summary, I think the way to go would be to make sure that transforms that can drop messages to implement a dropped output like the remap transform and then provide a sink that can be used to wire up these outputs. The user then can have full control of what to reject and what to discard depending on how the topology is constructed.

@bruceg
Copy link
Member

bruceg commented Apr 18, 2022

Thank you for your valuable input.

Therefore, in terms of documentation/behaviour, Vector seems to just be operating as designed :). I wouldn't classify this issue as a bug but an enhancement or feature request instead.

For clarification, there are two classes of discarded events: those that are discarded as a normal part of the transform's behavior, like described for dedupe or filter or the like, and those that are discarded as a result of an error. It is the latter that are primarily at issue here while the documentation describes the former.

@hhromic
Copy link
Contributor

hhromic commented Apr 18, 2022

@bruceg ah that's a good clarification indeed.

Then it looks like we should better define dropped as normal transform operation.
For example, in the route transform, events sent to the _unmatched output are dropped as normal?
Same in the remap transform, events sent to the *.dropped output (for example via abort) are dropped as normal?
In the case of remap is tricky because the programmer currently has no way to express a dropped as normal intention.

@bruceg
Copy link
Member

bruceg commented Apr 18, 2022

in the route transform, events sent to the _unmatched output are dropped as normal? Same in the remap transform, events sent to the *.dropped output (for example via abort) are dropped as normal?

I think these are where one of @jszwedko's suggestions would come into play, where we would require all outputs to be consumed by another component at a configuration level, in which case they are not dropped at all.

@hhromic
Copy link
Contributor

hhromic commented Apr 18, 2022

Yes, indeed. Especially the suggestion about adding configurable ack behaviour to the blackhole sink.
In this way the user can explicitly configure how the acks should be done via topology construction.

@JeanMertz JeanMertz added type: enhancement A value-adding code change that enhances its existing functionality. and removed type: bug A code related bug. labels Apr 19, 2022
@jszwedko jszwedko added type: bug A code related bug. and removed type: enhancement A value-adding code change that enhances its existing functionality. labels Apr 19, 2022
@jszwedko
Copy link
Member Author

Just updating the status. I realize the lines are blurry here, but I think this is a bug given normal user expectations for this feature.

@honganan
Copy link
Contributor

honganan commented Aug 26, 2022

I caught this error and the consuming from Kafka totally stopped:

2022-08-26T03:18:28.849503Z ERROR source{component_kind="source" component_id=kafka_in_be component_type=kafka component_name=kafka_in_be}: vector::internal_events::kafka: Event received a negative acknowledgment, topic has been stopped. error_code="negative_acknowledgement" error_type="acknowledgment_failed" stage="sending" topic="log.java_standard_log" partition=1 offset=12277332826

I build from newest master version these days, does it related to this bug?

@jszwedko
Copy link
Member Author

Hi @honganan ! What you see there is a change to have Vector avoid acknowledging the event in Kafka if it failed to successfully process it. This change was made to avoid data loss by letting operators intervene to fix the issue that caused the failure.

@jszwedko
Copy link
Member Author

Hi @honganan ! What you see there is a change to have Vector avoid acknowledging the event in Kafka if it failed to successfully process it. This change was made to avoid data loss by letting operators intervene to fix the issue that caused the failure.

After reconsidering this behavior in light of a newly codified UX principle some planned work on error handling, we've decided to revert it for v0.24.0.

bruceg added a commit that referenced this issue Aug 29, 2022
…nts (#14135)

We are moving forward on handling processing and delivery errors by alternate
means that will no longer require sources to reject events. This halting
behavior is surprising to users and causes situations that are difficult to
recover from without complicated manual adjustments. As such, we are going to
revert this series in order to prevent regressions.

This partially reverts the following commits:

fix(kafka source): Fix handling of negative acknowledgements (#12901)

2dfa85b

fix(journald source): Fix handling of negative acknowledgements (#12913)

da11de5

fix(file source): Fix handling of negative acknowledgements (#12936)

4c58422

Ref: #12217 (comment)
@honganan
Copy link
Contributor

Hi @honganan ! What you see there is a change to have Vector avoid acknowledging the event in Kafka if it failed to successfully process it. This change was made to avoid data loss by letting operators intervene to fix the issue that caused the failure.

After reconsidering this behavior in light of a newly codified UX principle some planned work on error handling, we've decided to revert it for v0.24.0.

That would be good for UX.

jszwedko pushed a commit that referenced this issue Aug 30, 2022
…nts (#14135)

We are moving forward on handling processing and delivery errors by alternate
means that will no longer require sources to reject events. This halting
behavior is surprising to users and causes situations that are difficult to
recover from without complicated manual adjustments. As such, we are going to
revert this series in order to prevent regressions.

This partially reverts the following commits:

fix(kafka source): Fix handling of negative acknowledgements (#12901)

2dfa85b

fix(journald source): Fix handling of negative acknowledgements (#12913)

da11de5

fix(file source): Fix handling of negative acknowledgements (#12936)

4c58422

Ref: #12217 (comment)
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
@frankh
Copy link
Contributor

frankh commented Aug 2, 2024

weird workaround i found for this, use reroute_dropped: true on the remap that drops events, then create an invalid sink that rejects everything, such as:

error:
  type: file
  path: ""  # intentionally invalid
  encoding:
    codec: json
  acknowledgements:
    enabled: true
  inputs:
    - events_parsed.dropped

this will reject all acks (with 500 internal server error for http source)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: delivery Anything related to delivering events within Vector such as end-to-end acknowledgements type: bug A code related bug.
Projects
None yet
Development

No branches or pull requests

7 participants