Upgrading kubernetes 1.30.x to 1.31.x with kubespray

I recently used release v2.27.0 of kubespray to upgrade my kubernetes cluster and I ran into an interesting issue.

Control Plane Upgrade Failure

I just kicked off the upgrade as I usually do:

> ansible-playbook -i inventory/home/hosts.yaml -b upgrade-cluster.yml

And it failed on the control plane node, during the drain the node hung:

TASK [upgrade/pre-upgrade : Set if node needs cordoning] *******************************************************************************************************************
ok: [ma]
Monday 06 January 2025  11:15:09 -0700 (0:00:00.055)       0:08:03.828 ********

TASK [upgrade/pre-upgrade : Cordon node] ***********************************************************************************************************************************
changed: [ma]
Monday 06 January 2025  11:15:10 -0700 (0:00:00.393)       0:08:04.221 ********
FAILED - RETRYING: [ma]: Drain node (3 retries left).

TASK [upgrade/pre-upgrade : Drain node] ************************************************************************************************************************************
fatal: [ma]: UNREACHABLE! => {"changed": false, "msg": "Data could not be sent to remote host \"192.168.1.51\". Make sure this host can be reached over ssh: ", "unreachable": true}

But it actually kept going and it tried to upgrade the worker nodes:

TASK [upgrade/post-upgrade : Wait for cilium] ******************************************************************************************************************************
fatal: [nd -> ma(192.168.1.51)]: FAILED! => {"changed": true, "cmd": ["/usr/local/bin/kubectl", "--kubeconfig", "/etc/kubernetes/admin.conf", "wait", "pod", "-n", "kube-system", "-l", "k8s-app=cilium", "--field-selector", "spec.nodeName==nd", "--for=condition=Ready", "--timeout=120s"], "delta": "0:02:00.049765", "end": "2025-01-06 11:50:09.172304", "msg": "non-zero return code", "rc": 1, "start": "2025-01-06 11:48:09.122539", "stderr": "error: timed out waiting for the condition on pods/cilium-h2frs", "stderr_lines": ["error: timed out waiting for the condition on pods/cilium-h2frs"], "stdout": "", "stdout_lines": []}

PLAY RECAP *****************************************************************************************************************************************************************
ma                         : ok=471  changed=50   unreachable=1    failed=0    skipped=526  rescued=0    ignored=0
nc                         : ok=434  changed=45   unreachable=0    failed=0    skipped=632  rescued=0    ignored=1
nd                         : ok=435  changed=47   unreachable=0    failed=1    skipped=626  rescued=0    ignored=1

After I recovered the control plane VM I tried to upgrade it again, but it failed with the following error:

> ansible-playbook -i inventory/home/hosts.yaml -b upgrade-cluster.yml --limit "ma"
...
ok: [ma]
Monday 06 January 2025  12:43:58 -0700 (0:00:00.618)       0:03:49.198 ********
FAILED - RETRYING: [ma]: Kubeadm | Upgrade first control plane node (3 retries left).
FAILED - RETRYING: [ma]: Kubeadm | Upgrade first control plane node (2 retries left).
FAILED - RETRYING: [ma]: Kubeadm | Upgrade first control plane node (1 retries left).

TASK [kubernetes/control-plane : Kubeadm | Upgrade first control plane node] ***********************************************************************************************
fatal: [ma]: FAILED! => {"attempts": 3, "changed": true, "cmd": ["timeout", "-k", "600s", "600s", "/usr/local/bin/kubeadm", "upgrade", "apply", "-y", "v1.31.4", "--certificate-renewal=True", "--ignore-preflight-errors=", "--allow-experimental-upgrades", "--etcd-upgrade=false", "--force"], "delta": "0:00:15.070363", "end": "2025-01-06 12:45:14.104662", "failed_when_result": true, "msg": "non-zero return code", "rc": 1, "start": "2025-01-06 12:44:59.034299", "stderr": "W0106 12:44:59.088587   28100 utils.go:69] The recommended value for \"clusterDNS\" in \"KubeletConfiguration\" is: [10.233.0.10]; the provided value is: [10.233.0.3]\n[upgrade/health] FATAL: [preflight] Some fatal errors occurred:\n\t[ERROR CreateJob]: Job \"upgrade-health-check-49pw9\" in the namespace \"kube-system\" did not complete in 15s: no condition of type Complete\n[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["W0106 12:44:59.088587   28100 utils.go:69] The recommended value for \"clusterDNS\" in \"KubeletConfiguration\" is: [10.233.0.10]; the provided value is: [10.233.0.3]", "[upgrade/health] FATAL: [preflight] Some fatal errors occurred:", "\t[ERROR CreateJob]: Job \"upgrade-health-check-49pw9\" in the namespace \"kube-system\" did not complete in 15s: no condition of type Complete", "[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "[preflight] Running pre-flight checks.\n[upgrade/config] Reading configuration from the cluster...\n[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'\n[upgrade] Running cluster health checks", "stdout_lines": ["[preflight] Running pre-flight checks.", "[upgrade/config] Reading configuration from the cluster...", "[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'", "[upgrade] Running cluster health checks"]}

PLAY RECAP *****************************************************************************************************************************************************************
ma                         : ok=552  changed=13   unreachable=0    failed=1    skipped=791  rescued=0    ignored=1

When I logged into the worker nodes most of the pods were having an issue coming up:

> k get po -A | grep cili
kube-system      cilium-4lktc                                                   0/1     Completed                         40             210d
kube-system      cilium-h2frs                                                   0/1     Completed                         45             210d
kube-system      cilium-operator-5c844c548c-5j4kd                               0/1     CreateContainerConfigError        8 (51m ago)    118d
kube-system      cilium-operator-5c844c548c-6b4fk                               0/1     CreateContainerConfigError        0              4m54s
kube-system      cilium-vqgpg                                                   0/1     Init:CreateContainerConfigError   0              25m

Manually Fixing the Control Plane node

I checked out one of the pods:

$ k -n kube-system describe pod cilium-operator-5c844c548c-5j4kd
...
...
Normal   Killing         51m                    kubelet  Container cilium-operator definition changed, will be restarted
Normal   Pulled          50m (x5 over 51m)      kubelet  Container image "quay.io/cilium/operator:v1.15.4" already present on machine
Warning  Failed          50m (x5 over 51m)      kubelet  Error: services have not yet been read at least once, cannot construct envvars
Warning  BackOff         49m (x7 over 51m)      kubelet  Back-off restarting failed container cilium-operator in pod cilium-operator-5c844c548c-5j4kd_kube-system(ce37b8b5-084f-41a4-94f6-12df2b00eb68)
Normal   SandboxChanged  48m                    kubelet  Pod sandbox changed, it will be killed and re-created.                                                                                                          
Warning  BackOff         46m (x4 over 47m)      kubelet  Back-off restarting failed container cilium-operator in pod cilium-operator-5c844c548c-5j4kd_kube-system(ce37b8b5-084f-41a4-94f6-12df2b00eb68)
Warning  Failed          45m (x10 over 48m)     kubelet  Error: services have not yet been read at least once, cannot construct envvars
Normal   Pulled          3m12s (x208 over 48m)  kubelet  Container image "quay.io/cilium/operator:v1.15.4" already present on machine

I found a recent issue that matched that error kubelet fails to start pods after upgrade to 1.31.X. It looks like an newer version of kubelet cannot connect to an older version of the kube-apiserver. Depending on the situation there is a fix in that same issue, however this will only work if the kube-apiserver has been upgraded already in an HA configuration and kubelet wasn’t connecting to the local kube-apiserver (and in my situation that was not the case). So then I decided to manually upgrade the control plane to make sure upgraded kubelet from worker nodes can start up. I ran the upgrade again and I saw the issue was with the health check failing. As the upgrade was hapenning in the background I saw the pods never succeeding:

$ k get po -n kube-system -o wide| grep upgrade
upgrade-health-check-nfl6g-gcwfg   0/1          ContainerCreating    0 n/a   nc          14s
upgrade-health-check-9jl8j-4gmfx   0/1          Terminating          0 n/a   nc          34s

I logged into the node where the pod was scheduled and I saw the following:

Jan 06 12:44:36 nc kubelet[49129]: E0106 12:44:36.140454   49129 kuberuntime_manager.go:1274] "Unhandled Error" err=<
Jan 06 12:44:36 nc kubelet[49129]:         init container &Container{Name:mount-cgroup,Image:quay.io/cilium/cilium:v1.15.4,Command:[sh -ec cp /usr/bin/cilium-mount /hostbin/cilium-mount;
Jan 06 12:44:36 nc kubelet[49129]:         nsenter --cgroup=/hostproc/1/ns/cgroup --mount=/hostproc/1/ns/mnt "${BIN_PATH}/cilium-mount" $CGROUP_ROOT;
Jan 06 12:44:36 nc kubelet[49129]:         rm /hostbin/cilium-mount
Jan 06 12:44:36 nc kubelet[49129]:         ],Args:[],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{EnvVar{Name:CGROUP_ROOT,Value:/run/cilium/cgroupv2,ValueFrom:nil,},EnvVar{Name:BIN_PATH,Value:/opt/cni/bin,ValueFrom:nil,},},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},Claims:[]ResourceClaim{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:hostproc,ReadOnly:false,MountPath:/hostproc,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},VolumeMount{Name:cni-path,ReadOnly:false,MountPath:/hostbin,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},VolumeMount{Name:kube-api-access-8d9hd,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:&SecurityContext{Capabilities:nil,Privileged:*true,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,ProcMount:nil,WindowsOptions:nil,SeccompProfile:nil,AppArmorProfile:nil,},Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,ResizePolicy:[]ContainerResizePolicy{},RestartPolicy:nil,} start failed in pod cilium-4lktc_kube-system(c96e887f-3eb9-4971-a8a7-9f8c70cfc6a5): CreateContainerConfigError: services have not yet been read at least once, cannot construct envvars
Jan 06 12:44:36 nc kubelet[49129]:  > logger="UnhandledError"
...
...
Jan 06 12:44:38 nc kubelet[49129]: I0106 12:44:38.561193   49129 kubelet.go:2407] "SyncLoop ADD" source="api" pods=["kube-system/upgrade-health-check-j4b6t-75v88"]
Jan 06 12:44:38 nc kubelet[49129]: I0106 12:44:38.561580   49129 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="kube-system/upgrade-health-check-j4b6t-75v88"
Jan 06 12:44:38 nc systemd[1]: Created slice kubepods-besteffort-podd9f63bcf_f3fc_4078_a421_14784b44de0e.slice - libcontainer container kubepods-besteffort-podd9f63bcf_f3fc_4078_a421_14784b44de0e.slice.
Jan 06 12:44:38 nc kubelet[49129]: I0106 12:44:38.610736   49129 reconciler_common.go:245] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kube-api-access-fqqhb\" (UniqueName: \"kubernetes.io/projected/d9f63bcf-f3fc-4078-a421-14784b44de0e-kube-api-access-fqqhb\") pod \"upgrade-health-check-j4b6t-75v88\" (UID: \"d9f63bcf-f3fc-4078-a421-14784b44de0e\") " pod="kube-system/upgrade-health-check-j4b6t-75v88"
Jan 06 12:44:38 nc kubelet[49129]: I0106 12:44:38.711136   49129 reconciler_common.go:218] "operationExecutor.MountVolume started for volume \"kube-api-access-fqqhb\" (UniqueName: \"kubernetes.io/projected/d9f63bcf-f3fc-4078-a421-14784b44de0e-kube-api-access-fqqhb\") pod \"upgrade-health-check-j4b6t-75v88\" (UID: \"d9f63bcf-f3fc-4078-a421-14784b44de0e\") " pod="kube-system/upgrade-health-check-j4b6t-75v88"
Jan 06 12:44:38 nc kubelet[49129]: I0106 12:44:38.718329   49129 operation_generator.go:637] "MountVolume.SetUp succeeded for volume \"kube-api-access-fqqhb\" (UniqueName: \"kubernetes.io/projected/d9f63bcf-f3fc-4078-a421-14784b44de0e-kube-api-access-fqqhb\") pod \"upgrade-health-check-j4b6t-75v88\" (UID: \"d9f63bcf-f3fc-4078-a421-14784b44de0e\") " pod="kube-system/upgrade-health-check-j4b6t-75v88"
Jan 06 12:44:38 nc kubelet[49129]: I0106 12:44:38.869662   49129 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="kube-system/upgrade-health-check-j4b6t-75v88"
Jan 06 12:44:38 nc containerd[582]: time="2025-01-06T12:44:38.870549425-07:00" level=info msg="RunPodSandbox for &PodSandboxMetadata{Name:upgrade-health-check-j4b6t-75v88,Uid:d9f63bcf-f3fc-4078-a421-14784b44de0e,Namespace:kube-system,Attempt:0,}"
Jan 06 12:44:40 nc kubelet[49129]: I0106 12:44:40.139213   49129 scope.go:117] "RemoveContainer" containerID="2a937da97320703bb33dd8437d6330d5c04e715dc3089f1a9551f491575f749d"
Jan 06 12:44:40 nc kubelet[49129]: I0106 12:44:40.139290   49129 scope.go:117] "RemoveContainer" containerID="221a0e7e99e6b475919424cc7e4fc67c535ee8a5f10a592cdad9997671f56c04"
Jan 06 12:44:40 nc kubelet[49129]: I0106 12:44:40.139300   49129 scope.go:117] "RemoveContainer" containerID="ff8ff142016bad157ead03427ecacc24c2d43b64baf7fcad0c2c430b68c9dc85"

And this was the same issue, since the worker nodes were upgraded before the control plane node cilium couldn’t start and the upgrade-health-check pod was failing to complete. So I ran the command manually and saw the same thing:

> sudo /usr/local/bin/kubeadm upgrade apply -y v1.31.4 --certificate-renewal=True --ignore-preflight-errors= --allow-experimental-upgrades --etcd-upgrade=false --force
[preflight] Running pre-flight checks.
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0106 12:46:43.115467   28126 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [10.233.0.3]
[upgrade] Running cluster health checks
[upgrade/health] FATAL: [preflight] Some fatal errors occurred:
	[ERROR CreateJob]: Job "upgrade-health-check-7th59" in the namespace "kube-system" did not complete in 15s: no condition of type Complete
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

There was an old bug with kubeadm with a single control plane node cluster which I thought was fixed, but it suggested to skip the preflight check. So I decided to skip the preflight check and just perform the upgrade:

> sudo /usr/local/bin/kubeadm upgrade apply -y v1.31.4 --certificate-renewal=True --ignore-preflight-errors="CreateJob" --allow-experimental-upgrades --etcd-upgrade=false --force
[preflight] Running pre-flight checks.
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0106 12:48:58.980866   28178 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [10.233.0.3]
[upgrade] Running cluster health checks
	[WARNING CreateJob]: Job "upgrade-health-check-2txgq" in the namespace "kube-system" did not complete in 15s: no condition of type Complete
[upgrade/version] You have chosen to change the cluster version to "v1.31.4"
[upgrade/versions] Cluster version: v1.30.4
[upgrade/versions] kubeadm version: v1.31.4
[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster
[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection
[upgrade/prepull] You can also perform this action beforehand using 'kubeadm config images pull'
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.31.4" (timeout: 5m0s)...
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests545818905"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing apiserver certificate
[upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[upgrade/staticpods] Renewing front-proxy-client certificate
[upgrade/staticpods] Moving new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backing up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2025-01-06-12-49-14/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This can take up to 5m0s
[apiclient] Found 1 Pods for label selector component=kube-apiserver
[upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Renewing controller-manager.conf certificate
[upgrade/staticpods] Moving new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backing up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2025-01-06-12-49-14/kube-controller-manager.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This can take up to 5m0s
[apiclient] Found 1 Pods for label selector component=kube-controller-manager
[upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Renewing scheduler.conf certificate
[upgrade/staticpods] Moving new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backing up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2025-01-06-12-49-14/kube-scheduler.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This can take up to 5m0s
[apiclient] Found 1 Pods for label selector component=kube-scheduler
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upgrade] Backing up kubelet config file to /etc/kubernetes/tmp/kubeadm-kubelet-config2888765671/config.yaml
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.31.4". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

And then all the pods came up without issues:

> k get po -n kube-system -o wide
NAME                               READY   STATUS    RESTARTS      AGE   IP              NODE   NOMINATED NODE   READINESS GATES
cilium-4px22                       1/1     Running   0             45h   192.168.1.52    nc     <none>           <none>
cilium-nggt9                       1/1     Running   0             45h   192.168.1.51    ma     <none>           <none>
cilium-operator-84579b79bd-8mz8l   1/1     Running   0             44h   192.168.1.52    nc     <none>           <none>
cilium-operator-84579b79bd-qrzc5   1/1     Running   0             45h   192.168.1.53    nd     <none>           <none>
cilium-vnz7v                       1/1     Running   0             45h   192.168.1.53    nd     <none>           <none>
coredns-69df789bc-2d5sp            1/1     Running   0             45h   10.233.65.181   nc     <none>           <none>
coredns-69df789bc-6tcgr            1/1     Running   0             44h   10.233.66.70    nd     <none>           <none>
dns-autoscaler-8576bb9f5b-fpg7q    1/1     Running   0             44h   10.233.66.39    nd     <none>           <none>
kube-apiserver-ma                  1/1     Running   0             45h   192.168.1.51    ma     <none>           <none>
kube-controller-manager-ma         1/1     Running   0             45h   192.168.1.51    ma     <none>           <none>
kube-proxy-c7wkg                   1/1     Running   0             45h   192.168.1.51    ma     <none>           <none>
kube-proxy-p9kd6                   1/1     Running   0             45h   192.168.1.52    nc     <none>           <none>
kube-proxy-xfhkb                   1/1     Running   0             45h   192.168.1.53    nd     <none>           <none>
kube-scheduler-ma                  1/1     Running   0             45h   192.168.1.51    ma     <none>           <none>
metrics-server-6c8bff4c-fq2f5      1/1     Running   0             44h   10.233.65.1     nc     <none>           <none>
nginx-proxy-nc                     1/1     Running   1 (46h ago)   46h   192.168.1.52    nc     <none>           <none>
nginx-proxy-nd                     1/1     Running   1 (46h ago)   46h   192.168.1.53    nd     <none>           <none>
state-metrics-8c55c666b-bjqxg      1/1     Running   0             44h   10.233.65.66    nc     <none>           <none>

I re-ran the upgrade using kubespray one more time and all was good.