Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K3s startup sequencing is inefficient #11896

Open
brandond opened this issue Mar 6, 2025 · 0 comments
Open

K3s startup sequencing is inefficient #11896

brandond opened this issue Mar 6, 2025 · 0 comments
Assignees

Comments

@brandond
Copy link
Member

brandond commented Mar 6, 2025

Some K3s components prefer to wait until other components are up and ready before running. For example:

  • The kubelet does not start until the container runtime is up.
    The kubelet also does not start until the apiserver is up when using the K3s embedded executor. RKE2 does this differently to host static pods.
  • The cloud-controller and controller-manager do not start until the apiserver is up.
  • The scheduler does not start until the apiserver is up and there is a schedulable node.
  • The agent doesn't start polling for endpoint updates until the apiserver is up.

In some cases where there is a hard dependency this is required to keep things from crashing, in other cases it is just done to avoid excessive log spew as things retry.

The current way that waits are done is inefficient, and could be improved:

  • We have a helper to wait on apiserver readiness, but we call it from four separate places, instead of having a single readiness channel that things can block on:
    if err := util.WaitForAPIServerReady(ctx, e.nodeConfig.AgentConfig.KubeConfigKubelet, util.DefaultAPIServerReadyTimeout); err != nil {
    logrus.Fatalf("Kubelet failed to wait for apiserver ready: %v", err)
    }

    if err := util.WaitForAPIServerReady(ctx, config.AgentConfig.KubeConfigKubelet, util.DefaultAPIServerReadyTimeout); err != nil {
    logrus.Fatalf("Tunnel watches failed to wait for apiserver ready: %v", err)
    }

    k3s/pkg/agent/run.go

    Lines 177 to 179 in d694dd1

    if err := util.WaitForAPIServerReady(ctx, nodeConfig.AgentConfig.KubeConfigKubelet, util.DefaultAPIServerReadyTimeout); err != nil {
    return pkgerrors.WithMessage(err, "failed to wait for apiserver ready")
    }

    case err := <-promise(func() error { return util.WaitForAPIServerReady(ctx, runtime.KubeConfigSupervisor, 30*time.Second) }):
    if err != nil {
    logrus.Infof("Waiting for API server to become available")
    continue
    }
  • The apiserver readiness helper doesn't currently provide any feedback on why the apiserver is not ready yet, due to a misunderstanding about how error handling works within the RestClient library.

    k3s/pkg/util/api.go

    Lines 86 to 89 in d694dd1

    if rerr := result.Error(); rerr != nil {
    lastErr = pkgerrors.WithMessage(rerr, "failed to get apiserver /readyz status")
    return false, nil
    }
  • The server tries to wait on container runtime readiness before starting etcd (for RKE2 reasons), but this is currently done by passing a channel from a Server config struct to the Agent config struct, which the agent closes if it is non-nil to signal that the runtime is ready.
    agentConfig.ContainerRuntimeReady = containerRuntimeReady

    k3s/pkg/agent/run.go

    Lines 162 to 168 in fb870ad

    // the container runtime is ready to host workloads when containerd is up and the airgap
    // images have finished loading, as that portion of startup may block for an arbitrary
    // amount of time depending on how long it takes to import whatever the user has placed
    // in the images directory.
    if cfg.ContainerRuntimeReady != nil {
    close(cfg.ContainerRuntimeReady)
    }

    k3s/pkg/etcd/etcd.go

    Lines 499 to 500 in fb870ad

    logrus.Infof("Waiting for container runtime to become ready before joining etcd cluster")
    case <-e.config.Runtime.ContainerRuntimeReady:

All of this readiness stuff should be moved into the Executor interface. The Executor is responsible for running things, it should also provide functions that return channels which can be read to indicate when various components started by the Executor are up.

With this done, several places that currently block startup of other functions due to difficulty of passing around wait channels can be made async, and simply block on readiness were necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Peer Review
Development

No branches or pull requests

1 participant