Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd sometimes(very rare) gets broken after a cluster-reset action #11992

Open
1 of 2 tasks
aganesh-suse opened this issue Mar 20, 2025 · 2 comments
Open
1 of 2 tasks
Milestone

Comments

@aganesh-suse
Copy link

aganesh-suse commented Mar 20, 2025

Environment Details

Infrastructure

  • Cloud
  • Hosted

Node(s) CPU architecture, OS, and Version:

$ cat /etc/os-release
PRETTY_NAME="SUSE Linux Enterprise Server 15 SP5"

$ uname -m
x86_64

Cluster Configuration:

HA: 3 server/ 1 agent

k3s version:

 k3s -v
k3s version v1.29.15-rc1+k3s1 (5bc2f0ce)
go version go1.23.6

Describe the bug:

Config.yaml:

token: xxxx
cluster-init: true
write-kubeconfig-mode: "0644"
node-external-ip: 1.1.1.1
node-label:
- k3s-upgrade=server

Testing Steps to Reproduce:

  1. Copy config.yaml
$ sudo mkdir -p /etc/rancher/k3s && sudo cp config.yaml /etc/rancher/k3s
  1. Install k3s
curl -sfL https://get.k3s.io | sudo INSTALL_K3S_VERSION='v1.29.15-rc1+k3s1' sh -s - server
  1. Verify Cluster Status:
kubectl get nodes -o wide
kubectl get pods -A
  1. Using the killall script Stop two server nodes (Server 2 and 3)
sudo /usr/local/bin/k3s-killall.sh
  1. Shut down the server on the remaining node - Server1
$ sudo systemctl stop k3s 
  1. Run cluster-reset
$ sudo /usr/local/bin/k3s server --cluster-reset 
  1. Restart the server process
$ sudo systemctl start k3s 
  1. Move/Delete the db directories from other servers (2 and 3)
sudo mv /var/lib/rancher/k3s/server/db /var/lib/rancher/k3s/server/db-backup 
  1. Restart the server process on the other servers.
  2. Re-Verify Cluster Status:
kubectl get nodes -o wide
kubectl get pods -A

Expected behavior:

Nodes should be in Ready state
Pods should be in Running state

Actual behavior:

Each server think the nodes are in different states:

ip-172-31-28-209:~ # kubectl get node
NAME                                          STATUS     ROLES                       AGE     VERSION
ip-172-31-24-161.us-east-2.compute.internal   NotReady   control-plane,etcd,master   5h9m    v1.29.15-rc1+k3s1
ip-172-31-24-34.us-east-2.compute.internal    NotReady   control-plane,etcd,master   5h8m    v1.29.15-rc1+k3s1
ip-172-31-26-218.us-east-2.compute.internal   NotReady   <none>                      5h6m    v1.29.15-rc1+k3s1
ip-172-31-28-209.us-east-2.compute.internal   Ready      control-plane,etcd,master   5h12m   v1.29.15-rc1+k3s1

ip-172-31-24-34:~ # kubectl get node
NAME                                          STATUS     ROLES                       AGE     VERSION
ip-172-31-24-161.us-east-2.compute.internal   Ready      control-plane,etcd,master   5h9m    v1.29.15-rc1+k3s1
ip-172-31-24-34.us-east-2.compute.internal    Ready      control-plane,etcd,master   5h9m    v1.29.15-rc1+k3s1
ip-172-31-26-218.us-east-2.compute.internal   Ready      <none>                      5h6m    v1.29.15-rc1+k3s1
ip-172-31-28-209.us-east-2.compute.internal   NotReady   control-plane,etcd,master   5h12m   v1.29.15-rc1+k3s1

ip-172-31-24-161:~ # kubectl get node
NAME                                          STATUS     ROLES                       AGE     VERSION
ip-172-31-24-161.us-east-2.compute.internal   Ready      control-plane,etcd,master   5h10m   v1.29.15-rc1+k3s1
ip-172-31-24-34.us-east-2.compute.internal    Ready      control-plane,etcd,master   5h10m   v1.29.15-rc1+k3s1
ip-172-31-26-218.us-east-2.compute.internal   Ready      <none>                      5h7m    v1.29.15-rc1+k3s1
ip-172-31-28-209.us-east-2.compute.internal   NotReady   control-plane,etcd,master   5h13m   v1.29.15-rc1+k3s1

Some journal logs on the main server:

Mar 19 19:37:06 ip-172-31-28-209 k3s[2729]: time="2025-03-19T19:37:06Z" level=error msg="Sending HTTP/1.1 503 response to 127.0.0.1:52240: runtime core not ready"
.
Mar 19 19:37:10 ip-172-31-28-209 k3s[2729]: time="2025-03-19T19:37:10Z" level=info msg="Failed to get existing traefik HelmChart" error="helmcharts.helm.cattle.io \"traefik\" not found"
.
Mar 19 19:51:26 ip-172-31-28-209 k3s[8493]: {"level":"error","ts":"2025-03-19T19:51:26.637477Z","caller":"etcdserver/server.go:2381","msg":"Validation on configuration change failed","shouldApplyV3":false,"error":"membership: too many learner members in cluster","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyConfChange\n\t/go/pkg/mod/github.com/k3s-io/etcd/server/v3@v3.5.19-k3s1.30/etcdserver/server.go:2381\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply\n\t/go/pkg/mod/github.com/k3s-io/etcd/server/v3@v3.5.19-k3s1.30/etcdserver/server.go:2250\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries\n\t/go/pkg/mod/github.com/k3s-io/etcd/server/v3@v3.5.19-k3s1.30/etcdserver/server.go:1462\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll\n\t/go/pkg/mod/github.com/k3s-io/etcd/server/v3@v3.5.19-k3s1.30/etcdserver/server.go:1277\ngo.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8\n\t/go/pkg/mod/github.com/k3s-io/etcd/server/v3@v3.5.19-k3s1.30/etcdserver/server.go:1149\ngo.etcd.io/etcd/pkg/v3/schedule.(*fifo).run\n\t/go/pkg/mod/github.com/k3s-io/etcd/pkg/v3@v3.5.19-k3s1.30/schedule/schedule.go:157"}
.
.
Mar 19 19:40:56 ip-172-31-28-209 k3s[2729]: I0319 19:40:56.786805    2729 node_controller.go:431] Initializing node ip-172-31-24-34.us-east-2.compute.internal with cloud provider
Mar 19 19:40:56 ip-172-31-28-209 k3s[2729]: E0319 19:40:56.787012    2729 node_controller.go:240] error syncing 'ip-172-31-24-34.us-east-2.compute.internal': failed to get instance metadata for node ip-172-31-24-34.us-east-2.compute.internal: address annotations not yet set, requeuing
.
.
Mar 20 00:19:24 ip-172-31-28-209 k3s[22284]: E0320 00:19:24.674052   22284 server.go:310] "Unable to authenticate the request due to an error" err="[invalid bearer token, service account token has been invalidated]"

Please know this is not easily reproduced. It doesnt happen everytime.

@aganesh-suse
Copy link
Author

aganesh-suse commented Mar 20, 2025

@brandond
Copy link
Member

brandond commented Mar 20, 2025

@dereknola and I have run into this independently in the past. I suspect that the experimental force-cluster-reset flag that we are using to remove other cluster members has some edge conditions that can cause a split brain. Unfortunately we haven't been able to reproduce it with standalone etcd, nor have we identified anything we can do differently in k3s to avoid it.

The workaround is to restore a snapshot and then rejoin the other nodes. Restoring a snapshot when resetting cluster membership has not been observed to have the same issue with inconsistent cluster state across nodes.

It doesn't even have to be an OLD snapshot - when things are broken you can just take a snapshot on any of the affected nodes, then restore it, and things will work again.

@brandond brandond moved this from New to Stalled in K3s Development Mar 21, 2025
@brandond brandond added this to the Backlog milestone Mar 21, 2025
@brandond brandond moved this to Bugs in K3s Backlog Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Bugs
Status: Stalled
Development

No branches or pull requests

2 participants