-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Processor trace sampling rebased #10116
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This patch introduces a new trace sampling processor designed with a pluggable architecture, allowing easy extension to support multiple sampling strategies and backends. The initial implementation includes basic probabilistic sampling, with future patches planned to add additional sampling methods such as rate-limiting, latency-based, and tail-based sampling. The probabilistic sampler can be configured as follows: pipeline: inputs: - name: opentelemetry port: 4318 processors: traces: - name: sampling type: probabilistic debug: true rules: sampling_percentage: 40 outputs: - name: stdout match: '*' in this configuration: - debug mode (debug: true) is enabled, allowing detailed logging of sampling decisions. - sampling_percentage: 40 ensures that 40% of traces are retained, while the rest are discarded. - traces that pass sampling will be forwarded to the stdout output for visibility. Fluent Bit v4.0.0 * Copyright (C) 2015-2024 The Fluent Bit Authors * Fluent Bit is a CNCF sub-project under the umbrella of Fluentd * https://fluentbit.io ______ _ _ ______ _ _ ___ _____ | ___| | | | | ___ (_) | / || _ | | |_ | |_ _ ___ _ __ | |_ | |_/ /_| |_ __ __/ /| || |/' | | _| | | | | |/ _ \ '_ \| __| | ___ \ | __| \ \ / / /_| || /| | | | | | |_| | __/ | | | |_ | |_/ / | |_ \ V /\___ |\ |_/ / \_| |_|\__,_|\___|_| |_|\__| \____/|_|\__| \_/ |_(_)___/ [2025/02/28 16:46:00] [ info] [fluent bit] version=4.0.0, commit=0e885e2d60, pid=778903 [2025/02/28 16:46:00] [ info] [storage] ver=1.5.2, type=memory, sync=normal, checksum=off, max_chunks_up=128 [2025/02/28 16:46:00] [ info] [simd ] disabled [2025/02/28 16:46:00] [ info] [cmetrics] version=0.9.9 [2025/02/28 16:46:00] [ info] [ctraces ] version=0.6.0 [2025/02/28 16:46:00] [ info] [input:opentelemetry:opentelemetry.0] initializing [2025/02/28 16:46:00] [ info] [input:opentelemetry:opentelemetry.0] storage_strategy='memory' (memory only) [2025/02/28 16:46:00] [ info] [input:opentelemetry:opentelemetry.0] listening on 0.0.0.0:4318 [2025/02/28 16:46:00] [ info] [processor:sampling:sampling.0] initializing probabilistic sampling processor [2025/02/28 16:46:00] [ info] [sp] stream processor started [2025/02/28 16:46:00] [ info] [output:stdout:stdout.0] worker #0 started 🔍 Debug sampling 'probabilistic' (0x779068027940): before ┌─────────────────────────────────────────────────────────────────┐ │ trace_id=5b8efff798038103d269b633813fc60c │ ├─────────────────────────────────────────────────────────────────┤ │ spans: │ │ ├── id=eee19b7ec3c1b174 name=I'm a server span │ │ ├── id=eee19b7ec3c1b175 name=Child span of server span │ │ ├── id=eee19b7ec3c1b176 name=Database query │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ trace_id=6a9dfff798038103d269b633813fc60d │ ├─────────────────────────────────────────────────────────────────┤ │ spans: │ │ ├── id=fff19b7ec3c1b174 name=A span in another trace │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ trace_id=7c8efff798038103d269b633813fc60e │ ├─────────────────────────────────────────────────────────────────┤ │ spans: │ │ ├── id=0000000000000000 name=Slow request │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ trace_id=8d9efff798038103d269b633813fc60f │ ├─────────────────────────────────────────────────────────────────┤ │ spans: │ │ ├── id=0000000000000000 name=High traffic span │ │ ├── id=0000000000000000 name=Load testing event │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ trace_id=9a1bfff798038103d269b633813fc610 │ ├─────────────────────────────────────────────────────────────────┤ │ spans: │ │ ├── id=0000000000000000 name=Faulty transaction │ │ ├── id=0000000000000000 name=Database rollback │ └─────────────────────────────────────────────────────────────────┘ 🔍 Debug sampling 'probabilistic' (0x779068027940): after ┌─────────────────────────────────────────────────────────────────┐ │ trace_id=6a9dfff798038103d269b633813fc60d │ ├─────────────────────────────────────────────────────────────────┤ │ spans: │ │ ├── id=fff19b7ec3c1b174 name=A span in another trace │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ trace_id=7c8efff798038103d269b633813fc60e │ ├─────────────────────────────────────────────────────────────────┤ │ spans: │ │ ├── id=0000000000000000 name=Slow request │ └─────────────────────────────────────────────────────────────────┘ |-------------------- RESOURCE SPAN --------------------| resource: - attributes: - service.name: 'other.service' - dropped_attributes_count: 0 - schema_url: "" [scope_span] instrumentation scope: - name : other.library - version : 2.0.0 - dropped_attributes_count: 0 - attributes: undefined schema_url: "" [spans] [span #0 'A span in another trace'] - trace_id : 6a9dfff798038103d269b633813fc60d - span_id : fff19b7ec3c1b174 - parent_span_id : undefined - kind : 2 (server) - start_time : 1544712660000000000 - end_time : 1544712662000000000 - dropped_attributes_count: 0 - dropped_events_count : 0 - dropped_links_count : 0 - trace_state : (null) - status: - code : 0 - attributes: none - events: none - [links] |-------------------- RESOURCE SPAN --------------------| resource: - attributes: - service.name: 'latency.test.service' - dropped_attributes_count: 0 - schema_url: "" [scope_span] instrumentation scope: - name : latency.test.library - version : 3.0.0 - dropped_attributes_count: 0 - attributes: undefined schema_url: "" [spans] [span #0 'Slow request'] - trace_id : 7c8efff798038103d269b633813fc60e - span_id : 0000000000000000 - parent_span_id : undefined - kind : 2 (server) - start_time : 1544712660000000000 - end_time : 1544712675000000000 - dropped_attributes_count: 0 - dropped_events_count : 0 - dropped_links_count : 0 - trace_state : (null) - status: - code : 0 - attributes: none - events: none - [links] Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
The processors callback for traces, supported only the incoming CTraces context which aimed to be modified by the processors. This patch changes the function prototype by adding a new optional argument to set a new output CTraces context. Behavior on return: - If the CTrace output context is NULL, it means the processor units should stop right away. The assumption is that the processor plugin did some buffering or simply discarded the context, no extra processing is needed. - if the CTrace output context is "different" than the incoming CTrace, it overrides the original context (original context is destroyed). Signed-off-by: Eduardo Silva <eduardo@calyptia.com>
Signed-off-by: Eduardo Silva <eduardo@calyptia.com>
…output Signed-off-by: Eduardo Silva <eduardo@calyptia.com>
Signed-off-by: Eduardo Silva <eduardo@calyptia.com>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
For tail sampling type, this commit adds a new 'latency' conditional that allows to select spans based on their duration (end time - start time) by matching specific thresholds: - threshold_ms_low : specifies the lower latency threshold. Traces with a duration <= this value will be sampled. - threshold_ms_high: specifies the upper latency threshold. Traces with a duration >= this value will be sampled. note that the thresholds are set in milliseconds. usage: pipeline: inputs: - name: opentelemetry port: 4318 processors: traces: - name: sampling type: tail sampling_settings: decision_wait: 5s conditions: - type: latency threshold_ms_high: 200 threshold_ms_high: 3000 This tail-based sampling configuration waits 5 seconds before making a decision. It samples traces based on latency, capturing short traces of 200ms or less and long traces of 3000ms or more. Traces between 200ms and 3000ms are not sampled unless another condition applies. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
This commit introduces the string_attribute conditional to the sampling processor, allowing traces to be sampled based on specific span or resource attributes. Users can define key-value filters like http.method=POST to selectively capture relevant traces: pipeline: inputs: - name: opentelemetry port: 4318 processors: traces: - name: sampling type: tail sampling_settings: decision_wait: 5s conditions: - type: string_attribute key: "http.method" values: ["GET"] - type: string_attribute key: "service.name" values: ["payment-processing"] outputs: - name: stdout match: '*' Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
This patch introduce the match_type property for the string_attribute conditional, it allows the values 'strict' (default) and 'exists'. usage: pipeline: inputs: - name: opentelemetry port: 4318 processors: traces: - name: sampling type: tail sampling_settings: decision_wait: 5s conditions: - type: string_attribute match_type: strict key: "http.method" values: ["GET"] - type: string_attribute match_type: exists key: "service.name" Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
…onal This commit introduces support for the numeric_attribute conditional in the sampling processor, allowing traces to be sampled based on numeric attribute values. Users can define min and max thresholds. usage: pipeline: inputs: - name: opentelemetry port: 4318 processors: traces: - name: sampling type: tail sampling_settings: decision_wait: 2s conditions: - type: numeric_attribute key: "http.status_code" min_value: 400 max_value: 504 outputs: - name: stdout match: '*' Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Adds a new conditional that allows to sample only the traces that contains a specific range of spans associated to it. The following configuration options are available: - min_spans: minimum number of expected spans - max_spans: maximum number of spans found in the trace usage: pipeline: inputs: - name: opentelemetry port: 4318 processors: traces: - name: sampling type: tail sampling_settings: decision_wait: 2s conditions: - type: span_count min_spans: 3 max_spans: 5 Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
This commit introduces support for the trace_state conditional in the sampling processor, allowing traces to be sampled based on metadata stored in the W3C trace_state field. configuration: - values: Defines a list of key-value pairs to match against the trace_state. A trace is sampled if any of the specified values exist in the trace_state. Matching follows OR logic, meaning at least one value must be present for sampling to occur. example: pipeline: inputs: - name: opentelemetry port: 4318 processors: traces: - name: sampling type: tail sampling_settings: decision_wait: 2s conditions: - type: trace_state values: [debug=false, priority=high] outputs: - name: stdout match: '*' Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
the new max_traces option allows to control the maximum number of traces in memory. When the value is exceeded, the oldest trace (arrival time) is deleted. Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.