Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release-1.32] Improve readiness polling on node startup #12038

Merged
merged 6 commits into from
Apr 1, 2025

Conversation

brandond
Copy link
Member

Proposed Changes

This PR eliminates the scattered creation of ready channels that are passed around via struct fields. Ready channels for all components started by the Executor are now exposed by the Executor, providing a consistent interface that different portions of the codebase can use to wait for components to be available.

  • Add context to agent token validation error
    Makes the error message nice when agents are retrying retrieval of cacerts from cluster
  • Increase log output while waiting for apiserver ready.
    We were never actually logging anything here, as we would always take the result.Error() path whenever the apiserver isn't ready - and log nothing.
  • Move apiserver ready wait into common channel
    Splits server startup into prepare/start phases. Server's agent is now started after server is prepared, but before the datastore and control-plane components are started. This allows us to properly bootstrap the executor before starting server components, and use the executor to provide a shared channel to wait on apiserver readiness.
    This allows us to replace four separate callers of WaitForAPIServerReady with reads from a common ready channel.
  • Move container runtime ready channel into executor
    Move the container runtime ready channel into the executor interface, instead of passing it awkwardly between server and agent config structs. This is for parity with the apiserver ready channel, and also adds validation that user-provided CRI services are up, something that was previously only done for containerd and cri-dockerd.
  • Move container etcd ready channel into executor
    This eliminates the final channel that was being passed around in an internal struct. The ETCD management code passes in a func that can be polled until etcd is ready; the executor is responsible for polling this after etcd is started and closing the etcd ready channel at the correct time.

Types of Changes

tech debt
bugfix

Verification

Check logs

Testing

RKE2 validation:

Linked Issues

User-Facing Change


Further Comments

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit c11c06c)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Increases log verbosity but decreases polling frequency to avoid
spamming the console. It usually takes a couple seconds for the
apiserver to come up anyway.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 2c13369)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Splits server startup into prepare/start phases. Server's agent is now
started after server is prepared, but before it is started. This allows
us to properly bootstrap the executor before starting server components,
and use the executor to provide a shared channel to wait on apiserver
readiness.

This allows us to replace four separate callers of WaitForAPIServerReady
with reads from a common ready channel.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 529e748)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Move the container runtime ready channel into the executor interface, instead of passing it awkwardly between server and agent config structs

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit a8bc412)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit 72bbd67)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This eliminates the final channel that was being passed around in an internal struct. The ETCD management code passes in a func that can be polled until etcd is ready; the executor is responsible for polling this after etcd is started and closing the etcd ready channel at the correct time.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
(cherry picked from commit d45006b)
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
@brandond brandond requested a review from a team as a code owner March 31, 2025 22:44
Copy link

codecov bot commented Mar 31, 2025

Codecov Report

Attention: Patch coverage is 74.00000% with 65 lines in your changes missing coverage. Please review.

Project coverage is 45.03%. Comparing base (0c7563b) to head (d1960b0).
Report is 6 commits behind head on release-1.32.

Files with missing lines Patch % Lines
pkg/agent/run.go 48.14% 12 Missing and 2 partials ⚠️
pkg/daemons/executor/embed.go 72.72% 5 Missing and 4 partials ⚠️
pkg/cli/server/server.go 71.42% 5 Missing and 3 partials ⚠️
pkg/agent/cri/cri.go 66.66% 4 Missing and 2 partials ⚠️
pkg/etcd/etcd.go 45.45% 6 Missing ⚠️
pkg/daemons/control/server.go 70.58% 3 Missing and 2 partials ⚠️
pkg/agent/tunnel/tunnel.go 84.61% 4 Missing ⚠️
pkg/cluster/cluster.go 71.42% 4 Missing ⚠️
pkg/cli/agent/agent.go 40.00% 2 Missing and 1 partial ⚠️
pkg/util/api.go 85.71% 2 Missing and 1 partial ⚠️
... and 2 more
Additional details and impacted files
@@               Coverage Diff                @@
##           release-1.32   #12038      +/-   ##
================================================
+ Coverage         44.93%   45.03%   +0.09%     
================================================
  Files               190      188       -2     
  Lines             19102    19065      -37     
================================================
+ Hits               8583     8585       +2     
+ Misses             9278     9234      -44     
- Partials           1241     1246       +5     
Flag Coverage Δ
e2etests 35.87% <67.20%> (+<0.01%) ⬆️
inttests 35.38% <67.60%> (+0.06%) ⬆️
unittests 16.85% <4.80%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@brandond brandond merged commit a524556 into k3s-io:release-1.32 Apr 1, 2025
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants