Skip to content

Commit 7445369

Browse files
committed
Extend documentation for nodes
1 parent 5d3831a commit 7445369

File tree

3 files changed

+103098
-7557
lines changed

3 files changed

+103098
-7557
lines changed

docs/xks/operator-guide/kubernetes/aks.md

+100-128
Original file line numberDiff line numberDiff line change
@@ -5,193 +5,165 @@ title: AKS
55

66
import useBaseUrl from '@docusaurus/useBaseUrl';
77

8-
## System Node Pool
8+
## Node Pools
99

10-
AKS requires the configuration of a system node pool when creating a cluster. This system node pool is not like the other additional node pools. It is tightly coupled to the AKS cluster. It is not
11-
possible without manual intervention to change the instance type or taints on this node pool without recreating the cluster. Additionally the system node pool cannot scale down to zero, for AKS to
12-
work there has to be at least one instance present. This is because critical system pods like Tunnelfront and CoreDNS will by default run on the system node pool. For more information about AKS
13-
system node pool refer to the [official documentation](https://docs.microsoft.com/en-us/azure/aks/use-system-pools#system-and-user-node-pools).
10+
### VM Disk Type
1411

15-
XKS follows the Azure recommendation and runs only system critical applications on the system node pool. Doing this protects services like CoreDNS from starvation or memory issues caused by user
16-
applications running on the same nodes. This is achieved by adding the taint `CriticalAddonsOnly` to all of the system nodes.
12+
XKF makes an opinionated choice with regards to the disk type. AKS has the option of either using managed disks och ephemeral storage. Managed disks offer the simplest solution, they can be sized according to requirements and are persisted across the whole nodes life cycle. The downside of managed disks is that the performance is limited as the disks are not located on the hardware. Disk performance is instead based on the size of the disk. The standard size used by AKS for the managed OS disk is 128 GB which makes it a [P10](https://azure.microsoft.com/en-us/pricing/details/managed-disks/) disk that will max out at 500 IOPS. It is important to remember that the OS disk is used by all processes. Pulled OCI binaries, container logs, and ephemeral Kubernetes volumes. All these processes will share the same disk performance. An application that for example writes large amount of requests, logs every HTTP request, can consume large amounts of IOPS as logs written to STDOUT will be written to disk. Another smaller downside with managed disks is that the disks are billed per GB on top of the VM cost, this represents a very small percentage of the total AKS cost.
1713

18-
### Sizing Nodes
14+
Ephemeral storage on the other hand offer higher IOPS out of the box at the cost of not persisting data and increased dependency on the VM type. This storage type uses the cache disk on the VM as storage for the OS and other kubelet related resources. The size of the cache will vary based on the VM type and size, meaning that different node pools may have different amounts of available storage for example ephemeral volumes. A general rule is however that the [cache disk has to be at least 30GB](https://docs.microsoft.com/en-us/azure/aks/cluster-configuration#use-ephemeral-os-on-existing-clusters) which removes some of the smallest VM sizes from the pool of possibilities. Remember that a cache disk of 30GB does not mean 30GB of free space as the OS will consume some of that space. It may be wise to lean towards fewer larger VMs instead of more smaller VMs to increase the amount of disk available.
1915

20-
Smaller AKS clusters can survive with a single node as the load on the system applications will be moderately low. In larger clusters and production clusters it is recommended to run at least three
21-
system nodes that may be larger in size. This section aims to describe how to properly size the system nodes.
16+
Instance type availability is not properly documented currently, partly because the feature is relatively new. Regional differences has been observed where ephemeral VMs may be available in one region but not the other for the same VM type and size. There is no proper way currently to determine which regions are available, instead this has to be done through trial and error. The same can be said about the cache instance size. Some instance types have the cache size documented others do not, but will still work. Check the [VM sizes](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes) documentation for availability information first. The cache size is given as the value in the parentheses in the "Max cached and temp storage throughput" column. XKF does not allow configuration of the OS disk size because the configuration is so error prone. Instead it has a list of [know chache sizes](https://github.com/XenitAB/terraform-modules/blob/main/modules/azure/aks/aks.tf#L100-L150) for each VM type and size. The correct OS disk size will be set based on the VM selected.
2217

23-
The minimum requirement for a system node is a VM with at least 2 vCPUs and 4GB of memory. Burstable B series VMs are not recommended. A good starting point for all clusters are the D series node
24-
types which have a balance of CPU and memory resources. A good starting point is a node of type `Standard_D2as_v4`.
18+
### System Pool
2519

26-
More work has to be done in this area regarding sizing and scaling of the system node pools to achieve a standardized solution.
20+
AKS requires the configuration of a system node pool when creating a cluster. This system node pool is not like the other additional node pools. It is tightly coupled to the AKS cluster. It is not possible without manual intervention to change the instance type or taints on this node pool without recreating the cluster. Additionally the system node pool cannot scale down to zero. For AKS to work there has to be at least one instance present. This is because critical system pods like Tunnelfront or Konnectivity and CoreDNS will by default run on the system node pool. For more information about AKS system node pool refer to the [official documentation](https://docs.microsoft.com/en-us/azure/aks/use-system-pools#system-and-user-node-pools). XKF follows the Azure recommendation and runs only system critical applications on the system node pool. Doing this protects services like CoreDNS from starvation or memory issues caused by user applications running on the same nodes. This is achieved by adding the taint `CriticalAddonsOnly` to all of the system nodes.
2721

28-
### Modifying Nodes
22+
The VM size and family of the default node pool is [hard coded](https://github.com/XenitAB/terraform-modules/blob/main/modules/azure/aks/aks.tf#L48-L63) in XKF. This is to assure standard configuration that follows best practices across all clusters. The default node pool VM will be a `Standard_D2ds_v5` instance configured to be ephemeral. There is currently only one configuration parameter for the default node pool. The parameter [production grade](https://github.com/XenitAB/terraform-modules/blob/56180e65d303469ca973d882760adacc82fdb740/modules/azure/aks/variables.tf#L37) determines if the default node pool should have one or two instances.
2923

30-
There may come times when Terraform wants to recreate the AKS cluster when the system node pool has been updated. This happens when updating certain properties in the system node pool. It is still
31-
possible to do these updates without recreating the cluster, but it requires some manual intervention. AKS requires at least one system node pool but does not have an upper limit. This makes it
32-
possible to manually add a new temporary system node pool. Remove the existing default node pool created by Terraform. Create a new system node pool with the same name but with the updated parameters.
33-
Finally remove the temporary node pool. Terraform will just assume that the changes have already been applied and import the new state without any other complaints.
24+
#### Updating Configuration
25+
26+
There may come times when Terraform wants to recreate the AKS cluster when the system node pool has been updated. This happens when updating certain properties in the system node pool. It is still possible to do these updates without recreating the cluster, but it requires some manual intervention. AKS requires at least one system node pool but does not have an upper limit. This makes it possible to manually add a new temporary system node pool. Remove the existing default node pool created by Terraform. Create a new system node pool with the same name but with the updated parameters. Finally remove the temporary node pool. Terraform will just assume that the changes have already been applied and import the new state without any other complaints.
3427

3528
Start off with creating a temporary system pool. Make sure to replace the cluster name and resource groups to the correct values.
3629

3730
```shell
38-
az aks nodepool add --cluster-name aks-dev-we-aks1 --resource-group rg-dev-we-aks --name temp --mode "System" --node-count 1
31+
az aks nodepool add --cluster-name <cluster_name> --resource-group <resource_group> --name temp --mode "System" --node-count 1
3932
```
4033

41-
> It may not be possible to create a new node pool with the current Kubernetes version if the cluster has not been updated in a while. Azure will remove minor versions as new versions are released. In
42-
> that case you will need to upgrade the cluster to the latest minor version before making changes to the system pool, as AKS will not allow a node with a newer version than the control plane.
34+
> It may not be possible to create a new node pool with the current Kubernetes version if the cluster has not been updated in a while. Azure will remove minor versions as new versions are released. In that case you will need to upgrade the cluster to the latest minor version before making changes to the system pool, as AKS will not allow a node with a newer version than the control plane.
4335
44-
Delete the system node pool created by Terraform:
36+
Delete the system node pool created by Terraform.
4537

4638
```shell
47-
az aks nodepool delete --cluster-name aks-dev-we-aks1 --resource-group rg-dev-we-aks --name default
39+
az aks nodepool delete --cluster-name <cluster_name> --resource-group <resource_group> --name default
4840
```
4941

50-
Create a new node pool with the new configuration. In this case it is setting a new instance type and adding a taint:
42+
Create a new node pool with the new configuration.
5143

5244
```shell
53-
az aks nodepool add --cluster-name aks-dev-we-aks1 --resource-group rg-dev-we-aks --name default --mode "System" --zones 1 2 3 --node-vm-size "Standard_D2as_v4" --node-taints "CriticalAddonsOnly=true:NoSchedule"
54-
--node-count 1
45+
az aks nodepool add --cluster-name <cluster_name> --resource-group <resource_group> --name default --mode "System" --node-count 1 --node-osdisk-type Ephemeral --node-osdisk-size 75 --node-vm-size Standard_D2ds_v5 --node-taints "CriticalAddonsOnly=true:NoSchedule" --zones 1 2 3
5546
```
5647

57-
Delete the temporary pool:
48+
Delete the temporary pool.
5849

5950
```shell
60-
az aks nodepool delete --cluster-name aks-dev-we-aks1 --resource-group rg-dev-we-aks --name temp
51+
az aks nodepool delete --cluster-name <cluster_name> --resource-group <resource_group> --name temp
6152
```
6253

6354
For additional information about updating the system nodes refer to [this blog post](https://pumpingco.de/blog/modify-aks-default-node-pool-in-terraform-without-redeploying-the-cluster/).
6455

65-
## Update AKS cluster
56+
### Worker Pool
6657

67-
### Useful commands in Kubernetes
58+
Worker node pools are all other node pools in the cluster. The main purpose of the worker node pools are to run application workloads. They do not run any system critical Pods. However they will run system Pods if they are deployed from a Daemonset, this includes applications like Kube Proxy and CSI drivers.
6859

69-
When patching an AKS cluster or just upgrading nodes it can be useful to watch your resources in Kubernetes.
60+
All node pools created within XKF will have autoscaling enabled and set to scale across all availability zones in the region. These settings cannot be changed, it is however possible to set a static amount of instances by specifying the min and max count to be the same. XKF exposes few settings to configure the node instances. The main one being the instance type, min and max count, and Kubernetes version. Other non default node pool settings will not be exposed as a setting as XKF is a opinionated solution. This means at times that default settings can be changed in the future.
7061

71-
```shell
72-
# Show node version
73-
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.kubernetes\.azure\.com\/node-image-version}{"\n"}{end}'
62+
#### Updating Configuration
7463

75-
# Watch nodes
76-
watch kubectl get nodes
64+
Updating the configuration of the worker pools may result in three different outcomes. The change can cause a simple update, force replacement of all of the nodes, or require a full re-creation of the node pool resource. It is fine to make a parameter change in place if it results in a quick update. However in the latter two cases it is better to replace the node pool all together. It is a safer option which allows for rollbacks.
7765

78-
# Check the status of all pods in the cluster
79-
kubectl get pods -A
66+
```hcl
67+
aks_config = {
68+
node_pools = [
69+
{
70+
name = "standard1"
71+
version = "1.21.9"
72+
vm_size = "Standard_D2ds_v5"
73+
min_count = 1
74+
max_count = 3
75+
node_labels = {}
76+
node_taints = []
77+
spot_enabled = false
78+
spot_max_price = null
79+
},
80+
]
81+
}
8082
```
8183

82-
### Terraform update Kubernetes version
83-
84-
TBD
84+
Add a new node pool to the cluster which is identical to the existing node pool minus any configuration changes that are supposed to be done. Notice that the existing node pool is called `standard1` and the new node pool is called `standard2`. This is a naming standard in XKF to be able to replace node pools. If the existing node pool was named `standard2` the new one should be called `standard1`. Apply the Terraform to create the new node pool.
8585

86-
### CLI update Kubernetes version
87-
88-
```shell
89-
export RG=rg1
90-
export POOL_NAME=default
91-
export CLUSTER_NAME=cluster1
92-
export AZURE_LOCATION=westeurope
93-
export KUBE_VERSION=1.21.9
86+
```hcl
87+
aks_config = {
88+
node_pools = [
89+
{
90+
name = "standard1"
91+
version = "1.21.9"
92+
vm_size = "Standard_D2ds_v5"
93+
min_count = 1
94+
max_count = 3
95+
node_labels = {}
96+
node_taints = []
97+
spot_enabled = false
98+
spot_max_price = null
99+
},
100+
{
101+
name = "standard2"
102+
version = "1.22.6"
103+
vm_size = "Standard_D2ds_v5"
104+
min_count = 1
105+
max_count = 3
106+
node_labels = {}
107+
node_taints = []
108+
spot_enabled = false
109+
spot_max_price = null
110+
},
111+
]
112+
}
94113
```
95114

96-
What AKS versions can I pick in this Azure location:
115+
Remove the existing node pool `standard1` from the configuration and apply the Terraform. This will safely cordon and drain all the nodes in the node pool before removing the VMs when all Pods have moved off of the node.
97116

98-
```shell
99-
az aks get-versions --location $AZURE_LOCATION -o table
117+
```hcl
118+
aks_config = {
119+
node_pools = [
120+
{
121+
name = "standard2"
122+
version = "1.22.6"
123+
vm_size = "Standard_D2ds_v5"
124+
min_count = 1
125+
max_count = 3
126+
node_labels = {}
127+
node_taints = []
128+
spot_enabled = false
129+
spot_max_price = null
130+
},
131+
]
132+
}
100133
```
101134

102-
```shell
103-
az aks get-upgrades --resource-group $RG --name $CLUSTER_NAME --output table
104-
```
135+
## FAQ
105136

106-
We recommend to only upgrade control-plane separately and then upgrade the nodes.
137+
### Are there useful commands when upgrading clusters or node pools?
107138

108-
```shell
109-
az aks upgrade --resource-group $RG --name $CLUSTER_NAME --kubernetes-version $KUBE_VERSION --control-plane-only
110-
```
139+
Show node version.
111140

112141
```shell
113-
az aks nodepool upgrade --resource-group $RG --cluster-name $CLUSTER_NAME --name $POOL_NAME --kubernetes-version $KUBE_VERSION
142+
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.labels.kubernetes\.azure\.com\/node-image-version}{"\n"}{end}'
114143
```
115144

116-
### Upgrading node pools without upgrading cluster
117-
118-
From time to time you might want to upgrade your Node Pools without upgrading the Kubernetes version. We always recommend to look at
119-
the [official documentation](https://docs.microsoft.com/en-us/azure/aks/node-image-upgrade)as well.
120-
121-
The node pool will spin up a new node and drain the existing one.
122-
When this is done the old node will be deleted.
123-
124-
The below command works great for smaller clusters. If you want to upgrade more nodes faster it is possible to do so. Read the documentation for more information.
145+
Watch all nodes.
125146

126147
```shell
127-
export RG=rg1
128-
export POOL_NAME=default
129-
export CLUSTER_NAME=cluster1
148+
watch kubectl get nodes
130149
```
131150

132-
Get the latest available node versions for your node pool:
151+
Check the status of all pods in the cluster.
133152

134153
```shell
135-
az aks nodepool get-upgrades --nodepool-name $POOL_NAME --cluster-name $CLUSTER_NAME --resource-group $RG
154+
kubectl get pods -A
136155
```
137156

138-
Upgrade the image on the specified node pool:
157+
### Which Kubernetes versions are available in my region?
158+
159+
View all Kubernetes versions in a region.
139160

140161
```shell
141-
az aks nodepool upgrade --resource-group $RG --cluster-name $CLUSTER_NAME --name $POOL_NAME --node-image-only
162+
az aks get-versions --location <location> -o table
142163
```
143164

144-
## Change vm size through Terraform
145-
146-
If you want to use terraform to change the your node pools VM size you **can't** just change the vm_size in the additional_node_pools config.
147-
This will tell Azure to drain all the nodes and then delete the existing ones, then Azure will spin up a new node pool after the existing one is gone.
148-
149-
This might be fine if you already have multiple additional node pools and you pods don't have specific node affinities.
150-
151-
But if that isn't the case terraform will most likely run for ever since it won't be able to destroy the nodes that you already have workload on.
152-
Or even worse it will destroy the existing node and you won't have any node pools in your cluster to manage your workloads.
153-
154-
Instead you have to add a second additional node pool in to your cluster.
155-
156-
For example:
165+
Get Kubernets version upgrade paths for a specific cluster.
157166

158-
```.hcl
159-
additional_node_pools = [
160-
{
161-
name = "standard"
162-
orchestrator_version = "1.21.2"
163-
vm_size = "Standard_E2s_v4"
164-
min_count = 1
165-
max_count = 5
166-
node_labels = {}
167-
node_taints = []
168-
spot_enabled = false
169-
spot_max_price = null
170-
},
171-
{
172-
name = "standard2"
173-
orchestrator_version = "1.21.2"
174-
vm_size = "Standard_F4s_v2"
175-
min_count = 1
176-
max_count = 5
177-
node_labels = {}
178-
node_taints = []
179-
spot_enabled = false
180-
spot_max_price = null
181-
}
182-
]
167+
```shell
168+
az aks get-upgrades --resource-group <resource_group> --name <cluster_name> -o table
183169
```
184-
185-
Run terraform and see that standard2 is up and running.
186-
187-
Now you can remove the standard node pool and standard2 should be able to handle the new load.
188-
189-
Azure will automatically drain all the data from the old standard node pool.
190-
191-
Remember to set min_count so that your current workload fits, you can always reduce min_count later.
192-
The cluster autoscaler will scale up new vm:s of standard2 but it will take time.
193-
During the creation of more standard2 nodes much of your workload might become pending.
194-
195-
## AKS resources
196-
197-
To get a quick overview of what is happening in AKS you can look at its [changelog](https://github.com/Azure/AKS/releases).

0 commit comments

Comments
 (0)