-
Notifications
You must be signed in to change notification settings - Fork 1.8k
tests: fix pthread_join on stopped threads in tests. #10787
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ctly. Signed-off-by: Phillip Whelan <phillip.whelan@chronosphere.io>
…art correctly. Signed-off-by: Phillip Whelan <phillip.whelan@chronosphere.io>
WalkthroughTest teardown logic changed to stop the Fluent Bit engine only when start succeeds. Functions now accept an int ret parameter from flb_start and conditionally call flb_stop based on ret == 0; destruction/freeing always occurs. Call sites in affected tests were updated accordingly. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant T as Test
participant FB as Fluent Bit Context
T->>FB: flb_start(ctx)
FB-->>T: int ret
rect rgba(230, 245, 255, 0.5)
note over T: Teardown
alt ret == 0 (start succeeded)
T->>FB: flb_stop(ctx)
else ret != 0 (start failed)
note over T,FB: Skip stop
end
T->>FB: flb_destroy(ctx)
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
tests/runtime/in_netif.c (1)
76-80
: Avoid unnecessary delay on failure path; move sleep(1) under the success guardWhen start fails, the extra one-second sleep isn’t needed. Move it inside the ret == 0 block so only successful-start paths wait before stopping.
Apply this diff inside test_ctx_destroy:
- sleep(1); - if (ret == 0) { - flb_stop(ctx->flb); - } + if (ret == 0) { + /* allow pending flush before stopping; only relevant if engine started */ + sleep(1); + flb_stop(ctx->flb); + }tests/runtime/in_podman_metrics.c (1)
86-91
: Document the guard to aid future maintenanceA short comment here will save future churn when someone wonders why stop is conditional.
Apply this doc-only tweak:
void do_destroy(flb_ctx_t *ctx, int ret) { - if (ret == 0) { + /* Guard flb_stop(): on start failure, do_start() may have already joined the worker. */ + if (ret == 0) { flb_stop(ctx); } flb_destroy(ctx); }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
tests/runtime/in_netif.c
(5 hunks)tests/runtime/in_podman_metrics.c
(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/runtime/in_netif.c (1)
src/flb_lib.c (1)
flb_stop
(942-985)
tests/runtime/in_podman_metrics.c (1)
src/flb_lib.c (3)
flb_stop
(942-985)flb_destroy
(223-258)flb_start
(914-925)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
- GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
- GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
- GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
- GitHub Check: PR - fuzzing test
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-22.04, clang-12)
- GitHub Check: pr-compile-centos-7
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
- GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, gcc, g++, ubuntu-24.04, clang-14)
🔇 Additional comments (3)
tests/runtime/in_netif.c (1)
72-80
: Guarding flb_stop() by start result prevents double pthread_join — good fixThis directly addresses the ASAN double-join by skipping flb_stop() when flb_start() returned non-zero. flb_stop() joins the worker even for FLB_LIB_NONE/ERROR, so calling it after a failed start that already joined the thread would indeed double-join. Solid change.
tests/runtime/in_podman_metrics.c (2)
86-91
: Stop only when start succeeded — fixes duplicate pthread_join in failing testsConditionally calling flb_stop(ctx) on ret == 0 aligns teardown with flb_start()’s outcome and prevents the ASAN-reported double join. Order (stop → destroy) is correct.
93-109
: Incomplete teardown propagation across test suiteThe verification scan shows that call-site updates for the new teardown contract were not applied consistently:
- Hundreds of tests still invoke
test_ctx_destroy(ctx)
with only the context, without propagating theret
value fromflb_start()
(e.g. in_kubernetes_events.c, filter_expect.c, many others).- Numerous tests continue to call
flb_stop(ctx)
(unconditionally) instead of relying on the conditional stop logic inside the updateddo_destroy(ctx, ret)
ortest_ctx_destroy(ctx, ret)
helpers.Because of this inconsistency, the original approval of “uniform teardown enforcement” does not hold. Please propagate the new teardown pattern across all tests:
- Update every
test_ctx_destroy
to accept and use theret
parameter, replacing directflb_stop(ctx)
calls.- Ensure all
do_destroy
helpers are called with bothctx
andret
.- Remove any remaining unconditional
flb_stop()
from the test suite so that teardown behavior is centralized and ASAN duplicate-join reports are eliminated.Likely an incorrect or invalid review comment.
@@ -226,7 +228,7 @@ static void flb_test_normal() | |||
TEST_MSG("expect: >=1 got: %d", got); | |||
} | |||
|
|||
test_ctx_destroy(ctx); | |||
test_ctx_destroy(ctx, ret); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
All call sites now propagate flb_start()’s return into teardown — consistent and correct
Passing ret into test_ctx_destroy(ctx, ret) ensures we only stop when the engine actually started. Call-site updates look complete for this file.
To ensure there are no remaining tests with the old pattern, run this repo-wide sweep:
Also applies to: 256-256, 290-290, 343-343
🏁 Script executed:
#!/usr/bin/env bash
set -euo pipefail
echo "1) Any direct flb_stop() in tests without an obvious guard?"
rg -n -C2 -g 'tests/runtime/**' --type=c '\bflb_stop\s*\('
echo
echo "2) Common teardown helpers calling flb_stop(); check they accept start ret and guard the stop"
rg -nU -g 'tests/runtime/**' --type=c '(?s)void\s+\w*(do_destroy|_destroy|_ctx_destroy)\s*\([^)]*\)\s*\{.*?\bflb_stop\s*\(' -C0
echo
echo "3) Call sites of teardown helpers; verify they pass the flb_start() result"
rg -n -g 'tests/runtime/**' --type=c '\b(do_destroy|test_ctx_destroy)\s*\(' -C2
echo
echo "4) Places that start but might still do unconditional flb_stop()"
rg -n -C3 -g 'tests/runtime/**' --type=c '\bret\s*=\s*flb_start\s*\('
Length of output: 263775
Update Remaining Tests to Propagate flb_start()
Return Into Teardown
While tests/runtime/in_netif.c
has been updated correctly—its test_ctx_destroy(ctx, ret)
guard ensures we only call flb_stop()
when ret == 0
—the repo-wide sweep still shows numerous test files using the old pattern. These need to be updated to maintain consistency and avoid unguarded calls to flb_stop()
.
Key areas requiring changes:
-
Any test in
tests/runtime/**
with an unconditionalflb_stop(ctx); // or flb_stop(ctx->flb); flb_destroy(ctx); // (or flb_destroy(ctx->flb);)
should instead
- Use a teardown helper that accepts the
ret
value (e.g.,test_ctx_destroy(ctx, ret)
ordo_destroy(ctx, ret)
), and - Guard the call to
flb_stop()
inside that helper withif (ret == 0)
.
- Use a teardown helper that accepts the
-
Teardown helper definitions (e.g.,
static void test_ctx_destroy(struct test_ctx *ctx)
) must be refactored to accept anint ret
parameter and wrapflb_stop()
accordingly.
Examples of files to update (non-exhaustive):
tests/runtime/filter_parser.c
(multipleflb_stop(ctx);
at lines 162, 249, 326, …)tests/runtime/out_flowcounter.c
(lines 64, 100, 135, …)tests/runtime/out_loki.c
(lines 106, 165, 225, …)tests/runtime/in_tail.c
(lines 241, 551, 671, …)tests/runtime/processor_metrics_selector.c
(lines 144, 220, 302, …)- And many more under
tests/runtime/**
Please extend the pattern used in in_netif.c
across the suite:
- flb_stop(ctx);
- flb_destroy(ctx);
+ test_ctx_destroy(ctx, ret); // or do_destroy(ctx, ret)
And update the helper:
-static void test_ctx_destroy(struct test_ctx *ctx)
+static void test_ctx_destroy(struct test_ctx *ctx, int ret)
{
if (ret == 0) {
flb_stop(ctx->flb);
}
flb_destroy(ctx->flb);
}
With these changes, all tests will uniformly propagate and respect the return value of flb_start()
.
Summary
The ASAN (enabled with
-DSANITIZE_ADDRESS=On
) reports errors whenpthread_join
is called on the same thread twice. This happens during two different tests:This is because both tests have configurations that fail on purpose, causing
do_start
to callpthread_join
, which is called again whenflb_stop
is called on the same context.This patch simply makes the call to
flb_stop
conditional on the return value offlb_start
during the test.Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-test
label to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
Tests
Chores
Bug Fixes