-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add skip_etcd_even_nodes_check var to allow even number of etcd nodes #12014
base: master
Are you sure you want to change the base?
add skip_etcd_even_nodes_check var to allow even number of etcd nodes #12014
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: pedro-peter The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @pedro-peter. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
In which situation would you want even number of etcd? |
when replacing a control-plane/etcd node, you might want to add the new node before removing the old node: there's a short duration where you'll have an even number of etcd nodes. |
Actually, I think you need to remove the old before adding the new. |
yes, it's remove then add Etcd documentation on live reconfigurations:
and
Also I don't think kubespray etcd role is tested to do such operations. Overall I don't think we should add variable to allow an even number of etcd. It's never a situation you want to be in and could mislead users new to kubernetes. |
I still think it's useful to allow creating an uneven number of etcd nodes (as long as it's disabled by default) currently you can only do this by disabling all assert checks... cluster operators can make the decision if they want to risk running a cluster for a duration with an uneven number of etcd nodes. |
When you remove a node you also end up with an even amount of nodes, and with 2 nodes the quorum is 2, so until you deploy the new node, you have zero failure tolerance But I think the safer way to avoid this is to add two nodes, e.g. etcd4 and etcd5, then remove etcd3 and etcd5 at the same time. Using etcd learners would also help |
I think people are missing the point of this PR... You can already add an uneven number of control-plane/etcd nodes in a kubespray cluster. You currently need to disable all assert checks (https://github.com/kubernetes-sigs/kubespray/blob/master/roles/kubernetes/preinstall/tasks/0040-verify-settings.yml) using var If folks want to add an even number of control-plane/etcd nodes, it's better just to allow disabling the single asert check (https://github.com/kubernetes-sigs/kubespray/pull/12014/files#diff-f817513f21cf14346b8672affc950686b1c4e3b9521671e0bbb76aac63fa5f6bR67) and allow the remaining checks to run. |
I don't think anyone missed that, adding an even amount of etcd nodes is a large footgun, so i understand not wanting to add any form of support for it
It probably shouldn't be, but it's at least obvious that it's a dangerous operation |
this is definitely a safer operation to replace a control-plane/etcd node should probably be added to the documentation: there are cases though where a cluster operator might decide to take the risk of running an even number of etcd nodes (for a short duration). folks may not have the hardware available to spin up new control-plane nodes. |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Currently the only way to add an even number of control-plane/etcd nodes to a cluster is to disable all assert checks by passing
ignore_assert_errors=yes
It would be better just to skip the etcd even nodes assert check and leave all other asserts as they can catch useful problems.
Which issue(s) this PR fixes:
Fixes #12013
Special notes for your reviewer:
I've updated the doc to use the new
skip_etcd_even_nodes_check
flag. This isn't really a breaking change asignore_assert_errors=yes
will still work.Does this PR introduce a user-facing change?: