Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix!(codecs): Use '\0' delimiter as default stream decode framer #20778

Merged

Conversation

jorgehermo9
Copy link
Contributor

@jorgehermo9 jorgehermo9 commented Jul 2, 2024

Closes #20768.

I think that in my local environment some test fail but has nothing to do with the changes I did. Could you trigger them here and if its fails too in the CI I review what could happen?

Thanks

@bits-bot
Copy link

bits-bot commented Jul 2, 2024

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the domain: codecs Anything related to Vector's codecs (encoding/decoding) label Jul 2, 2024
@@ -53,7 +59,7 @@ pub struct CharacterDelimitedDecoderOptions {

impl CharacterDelimitedDecoderOptions {
/// Create a `CharacterDelimitedDecoderOptions` with a delimiter and optional max_length.
pub fn new(delimiter: u8, max_length: Option<usize>) -> Self {
pub const fn new(delimiter: u8, max_length: Option<usize>) -> Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noticed this and didn't hurt

| SerializerConfig::Json(_)
| SerializerConfig::Logfmt
| SerializerConfig::NativeJson
| SerializerConfig::RawMessage
| SerializerConfig::Text(_) => FramingConfig::NewlineDelimited,
SerializerConfig::Gelf => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to be consistent with the default_stream_framing of the DeserializerConfig. Noticed that this change was missing in #18008. Changed it here, but tell me if you want this PR to not do anything out of scope from #20768.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, yeah, this does seem like an oversight. I'm trying to understand the impact of it though since the original PR did seem to resolve the issue for people 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the changed struct was EncodingConfigWithFraming (#18419).

It seems that a few places relies on EncodingConfigWithFraming to get this default behaviour, and other places uses this method SerializerConfig::default_stream_framing. For example, see this line

encoder_framing_to_decoding_framer(config.config().default_stream_framing()),

Where we can see its relying on SerializerConfig::default_stream_framing in order to get the default framing for a given encoder.

There are a lot of sinks that relies on EncodingConfigWithFraming having a Option in order to calculate that default (https://github.com/search?q=repo%3Avectordotdev%2Fvector%20EncodingConfigWithFraming&type=code)

(None, Serializer::Gelf(_)) => {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, gotcha, thanks for looking into that.

@@ -138,4 +138,33 @@ mod tests {
let event = next.unwrap().0.pop().unwrap().into_log();
assert_eq!(event.get("bar").unwrap(), &Value::from(2));
}

#[tokio::test]
async fn gelf_streaming_decoding() {
Copy link
Contributor Author

@jorgehermo9 jorgehermo9 Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this test, I don't know if its correct to place it here. Didn't find anywhere else to place it.

I'm new in this project and can't find some modules easily.
Maybe this test is more of an integration test/e2e test, but couldn't find any related in the root /tests directory. I'm missing something?

It would be nice to have something like a TCP socket integration test where we test this same behavior...As the default_stream_framing is also called in

.unwrap_or_else(|| decoding.default_stream_framing()),
.

I thought of adding a similar test like this in

//////// TCP TESTS ////////
, what do you think?
In that case, I would create a TCP config setting only decoding = DeserializerConfig::from(GelfDeserializerConfig)(and leaving the framing as None in that config) and see how the TCP initialization takes the default framer.

What are your thoughts of all of this?

@jorgehermo9 jorgehermo9 marked this pull request as draft July 2, 2024 21:32
@jorgehermo9
Copy link
Contributor Author

Please, note that this is a breaking change as we are changing the default framing.

Maybe someone configured a source to send '\n' delimited gelf messages and left the vector config as the default, asumming that it would be correct.

In my opinion, we should adhere to the stardard and go ahead with this "breaking change"

@jorgehermo9 jorgehermo9 marked this pull request as ready for review July 2, 2024 21:39
@jorgehermo9 jorgehermo9 changed the title fix(gelf): Use '\0' delimiter as default streaming framer fix!(gelf): Use '\0' delimiter as default streaming framer Jul 3, 2024
@jorgehermo9 jorgehermo9 changed the title fix!(gelf): Use '\0' delimiter as default streaming framer fix!(gelf): Use '\0' delimiter as default stream decode framer Jul 3, 2024
@jszwedko jszwedko changed the title fix!(gelf): Use '\0' delimiter as default stream decode framer fix(codecs)!: Use '\0' delimiter as default stream decode framer Jul 3, 2024
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix @jorgehermo9!

Do you mind adding a changelog fragment? See https://github.com/vectordotdev/vector/blob/master/changelog.d/README.md

I think the test you added is ok. I'd actually be content with just testing the the default is set correctly without testing that it actually decode with the default config and \0 delimited input since the character delimited codec is tested separately. If anything, you could add a test to lib/codecs/src/decoding/framing/character_delimited.rs that tests \0 works as a delimiter, but I don't think it is strictly necessary.

| SerializerConfig::Json(_)
| SerializerConfig::Logfmt
| SerializerConfig::NativeJson
| SerializerConfig::RawMessage
| SerializerConfig::Text(_) => FramingConfig::NewlineDelimited,
SerializerConfig::Gelf => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, yeah, this does seem like an oversight. I'm trying to understand the impact of it though since the original PR did seem to resolve the issue for people 🤔

@jorgehermo9
Copy link
Contributor Author

Thanks for the review! I'll submit the changes asap.

I'd actually be content with just testing the the default is set correctly without testing that it actually decode with the default config and \0 delimited input since the character delimited codec is tested separately

Right, I will narrow the scope of that test, no problem. It seemed too simple for me to just test that, but it doesn't have sense to test the \0 framing there...

@jorgehermo9 jorgehermo9 changed the title fix(codecs)!: Use '\0' delimiter as default stream decode framer fix!(codecs): Use '\0' delimiter as default stream decode framer Jul 5, 2024
| DeserializerConfig::NativeJson(_) => {
FramingConfig::NewlineDelimited(Default::default())
}
DeserializerConfig::Protobuf(_) => FramingConfig::Bytes,
#[cfg(feature = "syslog")]
DeserializerConfig::Syslog(_) => FramingConfig::NewlineDelimited(Default::default()),
DeserializerConfig::Vrl(_) => FramingConfig::Bytes,
DeserializerConfig::Gelf(_) => {
FramingConfig::CharacterDelimited(CharacterDelimitedDecoderConfig::new(0))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this these days... I think that GELF Http maybe uses newline delimiter? I can't find anything related to streaming gelf in Http here https://go2docs.graylog.org/5-0/getting_in_log_data/gelf.html, but maybe someone will rely on that for http sources also and this will be a breaking change in those cases too, not only with TCP.

If you are ok with that, I will include that breaking changes and the "fix" (manually setting the newline delimiter in framing.method) in the changelog fragment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, I haven't heard of anyone using GELF over HTTP, but it seems plausible. TCP seems to be the dominant transport so I think it makes sense to configure the defaults to work well with that.

@github-actions github-actions bot removed the domain: codecs Anything related to Vector's codecs (encoding/decoding) label Jul 22, 2024
@jorgehermo9
Copy link
Contributor Author

jorgehermo9 commented Jul 22, 2024

Hi @jszwedko, I'm sorry for addressing the changes that late. I simplified that test and moved it to another file. I also added a changelog fragment, I hope the explanation is clear... but I'd like to know what are your thoughts about it.

Could you trigger the CI and see if everything works ok?

Thank you very much!

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks again for this change @jorgehermo9 !

changelog.d/20768-default_gelf_stream_framing.breaking.md Outdated Show resolved Hide resolved

In order to maintain the previous behavior, you must set the `framing.method` option to the `character_delimited` method and the `framing.character_delimited.delimiter` option to `\n` when using GELF codec with stream-based sources.

### Example configuration change for socket source
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for including the example for using the previous default.

Small nit that we try to use YAML rather than TOML as the default these days.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

| SerializerConfig::Json(_)
| SerializerConfig::Logfmt
| SerializerConfig::NativeJson
| SerializerConfig::RawMessage
| SerializerConfig::Text(_) => FramingConfig::NewlineDelimited,
SerializerConfig::Gelf => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, gotcha, thanks for looking into that.

jorgehermo9 and others added 2 commits July 23, 2024 08:34
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for this!

@jszwedko jszwedko enabled auto-merge July 23, 2024 17:00
@jorgehermo9
Copy link
Contributor Author

Thank you very much for your prompt review!

@jszwedko jszwedko added this pull request to the merge queue Jul 23, 2024
Copy link

Regression Detector Results

Run ID: aaa2e237-6b13-469a-9485-3c81936a2d7c Metrics dashboard

Baseline: 2c75df4
Comparison: 21548fc

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

No significant changes in experiment optimization goals

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

There were no significant changes in experiment optimization goals at this confidence level and effect size tolerance.

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI links
file_to_blackhole egress throughput -9.98 [-16.26, -3.69]

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI links
syslog_splunk_hec_logs ingress throughput +3.48 [+3.36, +3.61]
http_text_to_http_json ingress throughput +3.18 [+3.05, +3.31]
datadog_agent_remap_datadog_logs_acks ingress throughput +1.99 [+1.79, +2.20]
syslog_log2metric_splunk_hec_metrics ingress throughput +1.15 [+1.05, +1.25]
syslog_regex_logs2metric_ddmetrics ingress throughput +1.15 [+1.02, +1.28]
datadog_agent_remap_blackhole ingress throughput +0.75 [+0.63, +0.88]
splunk_hec_route_s3 ingress throughput +0.60 [+0.27, +0.94]
syslog_log2metric_humio_metrics ingress throughput +0.59 [+0.47, +0.72]
syslog_loki ingress throughput +0.30 [+0.22, +0.38]
splunk_hec_indexer_ack_blackhole ingress throughput +0.01 [-0.08, +0.09]
splunk_hec_to_splunk_hec_logs_acks ingress throughput +0.01 [-0.11, +0.13]
splunk_hec_to_splunk_hec_logs_noack ingress throughput -0.02 [-0.13, +0.09]
http_to_http_acks ingress throughput -0.06 [-1.37, +1.26]
http_to_http_json ingress throughput -0.08 [-0.16, +0.01]
datadog_agent_remap_blackhole_acks ingress throughput -0.13 [-0.23, -0.03]
http_to_http_noack ingress throughput -0.19 [-0.30, -0.09]
otlp_grpc_to_blackhole ingress throughput -0.60 [-0.72, -0.48]
http_to_s3 ingress throughput -0.65 [-0.92, -0.38]
syslog_log2metric_tag_cardinality_limit_blackhole ingress throughput -0.85 [-0.92, -0.78]
fluent_elasticsearch ingress throughput -0.91 [-1.42, -0.40]
socket_to_socket_blackhole ingress throughput -1.31 [-1.39, -1.22]
datadog_agent_remap_datadog_logs ingress throughput -2.50 [-2.68, -2.31]
otlp_http_to_blackhole ingress throughput -2.76 [-2.94, -2.57]
http_elasticsearch ingress throughput -2.80 [-2.99, -2.61]
syslog_humio_logs ingress throughput -4.00 [-4.16, -3.84]
file_to_blackhole egress throughput -9.98 [-16.26, -3.69]

Explanation

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

Merged via the queue into vectordotdev:master with commit 21548fc Jul 23, 2024
49 checks passed
@jorgehermo9 jorgehermo9 deleted the fix/default-gelf-stream-framing branch July 23, 2024 19:04
ym pushed a commit to ym/vector that referenced this pull request Aug 18, 2024
…tordotdev#20778)

* fix(gelf): Use null delimiter as default streaming framer

* fix(gelf): compilation issues

* test(gelf): Improve gelf decoder test

* test(decoder): Fix gelf stream decoding test

* test(gelf): Improve gelf decoder test

* test(gelf): Fix test fail

* doc: add changelog

* test: simplify gelf stream default framing test

* doc: rewrite changelog

* style: cargo fmt

* docs: update changelog with PR suggestion

Co-authored-by: Jesse Szwedko <jesse@szwedko.me>

* docs: change changelog example to yaml

* Update 20768-default_gelf_stream_framing.breaking.md

---------

Co-authored-by: Jesse Szwedko <jesse@szwedko.me>
AndrooTheChen pushed a commit to discord/vector that referenced this pull request Sep 23, 2024
…tordotdev#20778)

* fix(gelf): Use null delimiter as default streaming framer

* fix(gelf): compilation issues

* test(gelf): Improve gelf decoder test

* test(decoder): Fix gelf stream decoding test

* test(gelf): Improve gelf decoder test

* test(gelf): Fix test fail

* doc: add changelog

* test: simplify gelf stream default framing test

* doc: rewrite changelog

* style: cargo fmt

* docs: update changelog with PR suggestion

Co-authored-by: Jesse Szwedko <jesse@szwedko.me>

* docs: change changelog example to yaml

* Update 20768-default_gelf_stream_framing.breaking.md

---------

Co-authored-by: Jesse Szwedko <jesse@szwedko.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GELF decoder's default framer with TCP input does not match the encoder's default
3 participants