Upgrade OpenEBS cStor
Upgrading OpenEBS
The upgrade process is split into two steps: Control Plane upgrade and Data Plane upgrade.
Upgrade the Operator/Control Plane
First we need to upgrade the cstor operator which is basically the control plane component. You can grab the yaml
manifest from the charts repo. Or you can download the release from cstor operator repo and use helm
to perform the upgrade. I tend to use kustomize
, so I just grabbed the cstor-operator.yaml and I applied it:
> k apply -f https://raw.githubusercontent.com/openebs/charts/gh-pages/cstor-operator.yaml --dry-run=client
namespace/openebs configured (dry run)
serviceaccount/openebs-cstor-operator configured (dry run)
clusterrole.rbac.authorization.k8s.io/openebs-cstor-operator configured (dry run)
...
...
deployment.apps/cspc-operator configured (dry run)
deployment.apps/cvc-operator configured (dry run)
service/cvc-operator-service configured (dry run)
deployment.apps/openebs-cstor-admission-server configured (dry run)
customresourcedefinition.apiextensions.k8s.io/blockdevices.openebs.io configured (dry run)
customresourcedefinition.apiextensions.k8s.io/blockdeviceclaims.openebs.io configured (dry run)
configmap/openebs-ndm-config configured (dry run)
daemonset.apps/openebs-ndm configured (dry run)
deployment.apps/openebs-ndm-operator configured (dry run)
deployment.apps/openebs-ndm-cluster-exporter configured (dry run)
service/openebs-ndm-cluster-exporter-service configured (dry run)
daemonset.apps/openebs-ndm-node-exporter configured (dry run)
service/openebs-ndm-node-exporter-service configured (dry run)
After the cstor-operator
has been upgraded you should see all the pods updated with new time stamps.
> k get po -n openebs
NAME READY STATUS RESTARTS AGE
cspc-operator-786cf9c94b-pd5qf 1/1 Running 0 2m36s
cstor-disk-pool-2qqg-847bcd77b4-h27b6 3/3 Running 3 (96d ago) 101d
cstor-disk-pool-5jtv-69d5566f75-zhd24 3/3 Running 0 101d
cstor-disk-pool-jfz2-d65fb5bcd-wfb4g 3/3 Running 0 11d
cvc-operator-69559dd754-j8qnv 1/1 Running 0 2m34s
openebs-cstor-admission-server-68d8f4ff44-985wh 1/1 Running 0 2m34s
openebs-cstor-csi-controller-0 6/6 Running 0 2m33s
openebs-cstor-csi-node-8zxfv 2/2 Running 0 2m7s
openebs-cstor-csi-node-r5klb 2/2 Running 0 2m38s
openebs-cstor-csi-node-vgnsx 2/2 Running 0 52s
openebs-ndm-cluster-exporter-69ff474f86-8klxk 1/1 Running 0 2m37s
openebs-ndm-fbnf9 1/1 Running 0 2m12s
openebs-ndm-g4cnh 1/1 Running 0 2m38s
openebs-ndm-node-exporter-88qq5 1/1 Running 0 2m17s
openebs-ndm-node-exporter-9zhj8 1/1 Running 0 119s
openebs-ndm-node-exporter-x6ctr 1/1 Running 0 2m38s
openebs-ndm-operator-68c8b6d56c-rvpgd 1/1 Running 0 2m37s
openebs-ndm-zghft 1/1 Running 0 2m28s
pvc-087960e4-c718-4227-84f8-d3a6ca423295-target-9dc8b9c69-4g2tr 3/3 Running 0 35d
pvc-45bb9001-5a14-40b7-8caa-fcae972c31bc-target-5f6dd8b5fc5nqb9 3/3 Running 0 35d
pvc-82a81b9a-d585-48b9-9b58-dc3f5b3c2461-target-64dc4f69ffvdtv8 3/3 Running 0 34d
pvc-9670d96c-f1e2-4e6a-a223-25960cf87b44-target-55bfb649cfk5gqf 3/3 Running 0 35d
pvc-dc70575e-499f-4737-a148-e511453df233-target-5558c748fdpnjkq 3/3 Running 0 35d
pvc-eacb5ff7-6c28-4ad5-88eb-e877eaa51b56-target-65674498c-vjrjl 3/3 Running 0 35d
pvc-fc6ec633-d0e1-420c-b64b-cfefe3a425bd-target-6565bf547btjbpr 3/3 Running 0 35d
But from above we can see that the pvc-
and cstor-disk-pool-
pods still haven’t been upgraded. Let’s upgrade them next.
Upgrade the OpenEBS Resouces/Data plane
The steps to upgrade the rest of the resouces are covered in: Upgrade OpenEBS. We have to create two more jobs one for CSPC pools
and one for cStor CSI volumes
. So list your pools you can run the following:
> k get cspc -n openebs
NAME HEALTHYINSTANCES PROVISIONEDINSTANCES DESIREDINSTANCES AGE
cstor-disk-pool 3 3 3 425d
Initially when I ran the cspc
upgrade, it failed with:
> k logs -n openebs cstor-cspc-upgrade-kfwnd -f
I1223 18:46:14.053050 1 cstor_cspc.go:66] Upgrading cstor-disk-pool to 3.6.0
I1223 18:46:14.190624 1 deployment.go:78] patching deployment cstor-disk-pool-2qqg
I1223 18:46:14.190707 1 deployment.go:81] deployment already in 3.6.0 version
I1223 18:46:14.190744 1 cspi.go:76] patching cspi cstor-disk-pool-2qqg
I1223 18:46:14.190802 1 cspi.go:79] cspi already in 3.6.0 version
I1223 18:46:15.304463 1 deployment.go:78] patching deployment cstor-disk-pool-5jtv
I1223 18:46:15.304574 1 deployment.go:81] deployment already in 3.6.0 version
I1223 18:46:15.304601 1 cspi.go:76] patching cspi cstor-disk-pool-5jtv
I1223 18:46:15.304667 1 cspi.go:79] cspi already in 3.6.0 version
I1223 18:46:17.504776 1 deployment.go:78] patching deployment cstor-disk-pool-jfz2
I1223 18:46:17.504809 1 deployment.go:81] deployment already in 3.6.0 version
I1223 18:46:17.504820 1 cspi.go:76] patching cspi cstor-disk-pool-jfz2
I1223 18:46:17.505320 1 cspi.go:79] cspi already in 3.6.0 version
I1223 18:46:18.709034 1 cspc.go:76] patching cspc cstor-disk-pool
E1223 18:46:18.724837 1 cstor_cspc.go:74] failed to patch cspc cstor-disk-pool: Internal error occurred: failed calling webhook "admission-webhook.cstor.openebs.io": failed to call webhook: Post "https://openebs-cstor-admission-server.openebs.svc:443/validate?timeout=5s": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2023-12-23T18:47:40Z is after 2023-10-24T04:03:31Z
F1223 18:46:18.724899 1 cstor_cspc.go:54] Failed to upgrade cStor CSPC cstor-disk-pool
I ran into this github issue. The fix in there helped:
> k delete validatingwebhookconfiguration openebs-cstor-validation-webhook
validatingwebhookconfiguration.admissionregistration.k8s.io "openebs-cstor-validation-webhook" deleted
> k delete -n openebs secret openebs-cstor-admission-secret
secret "openebs-cstor-admission-secret" deleted
> k rollout -n openebs restart deployment openebs-cstor-admission-server
deployment.apps/openebs-cstor-admission-server restarted
Then after fixing that and rerunning the job it worked:
I1223 19:01:16.957926 1 cstor_cspc.go:66] Upgrading cstor-disk-pool to 3.6.0
I1223 19:01:17.071859 1 deployment.go:78] patching deployment cstor-disk-pool-2qqg
I1223 19:01:17.071890 1 deployment.go:81] deployment already in 3.6.0 version
I1223 19:01:17.071902 1 cspi.go:76] patching cspi cstor-disk-pool-2qqg
I1223 19:01:17.071914 1 cspi.go:79] cspi already in 3.6.0 version
I1223 19:01:18.209498 1 deployment.go:78] patching deployment cstor-disk-pool-5jtv
I1223 19:01:18.209525 1 deployment.go:81] deployment already in 3.6.0 version
I1223 19:01:18.209762 1 cspi.go:76] patching cspi cstor-disk-pool-5jtv
I1223 19:01:18.209884 1 cspi.go:79] cspi already in 3.6.0 version
I1223 19:01:20.412175 1 deployment.go:78] patching deployment cstor-disk-pool-jfz2
I1223 19:01:20.412239 1 deployment.go:81] deployment already in 3.6.0 version
I1223 19:01:20.412258 1 cspi.go:76] patching cspi cstor-disk-pool-jfz2
I1223 19:01:20.412276 1 cspi.go:79] cspi already in 3.6.0 version
I1223 19:01:21.609073 1 cspc.go:76] patching cspc cstor-disk-pool
I1223 19:01:21.657720 1 cspc.go:98] cspc cstor-disk-pool patched
I1223 19:01:21.661561 1 cstor_cspc.go:192] Verifying the reconciliation of version for cstor-disk-pool
I1223 19:01:31.669945 1 cstor_cspc.go:77] Successfully upgraded cstor-disk-pool to 3.6.0
Stream closed EOF for openebs/cstor-cspc-upgrade-r9427 (upgrade)
To get a list of your volumes you can run the following:
> k get cvc -n openebs
NAME CAPACITY STATUS AGE
pvc-087960e4-c718-4227-84f8-d3a6ca423295 10Gi Bound 425d
pvc-45bb9001-5a14-40b7-8caa-fcae972c31bc 1Gi Bound 423d
pvc-82a81b9a-d585-48b9-9b58-dc3f5b3c2461 10Gi Bound 425d
pvc-9670d96c-f1e2-4e6a-a223-25960cf87b44 10Gi Bound 424d
pvc-dc70575e-499f-4737-a148-e511453df233 5Gi Bound 423d
pvc-eacb5ff7-6c28-4ad5-88eb-e877eaa51b56 50Gi Bound 425d
pvc-fc6ec633-d0e1-420c-b64b-cfefe3a425bd 20Gi Bound 365d
The cstor-volume
job worked without issues:
> k logs -n openebs cstor-volume-upgrade-w6798 -f
I1223 19:25:47.074741 1 service.go:77] Patching service pvc-45bb9001-5a14-40b7-8caa-fcae972c31bc
I1223 19:25:47.085676 1 service.go:99] Service pvc-45bb9001-5a14-40b7-8caa-fcae972c31bc patched
I1223 19:25:47.085738 1 cv.go:76] patching cv pvc-45bb9001-5a14-40b7-8caa-fcae972c31bc
I1223 19:25:47.118663 1 cv.go:98] cv pvc-45bb9001-5a14-40b7-8caa-fcae972c31bc patched
I1223 19:25:47.132349 1 cstor_volume.go:420] Verifying the reconciliation of version for pvc-45bb9001-5a14-40b7-8caa-fcae972c31bc
I1223 19:25:57.138638 1 cvc.go:76] patching cvc pvc-45bb9001-5a14-40b7-8caa-fcae972c31bc
I1223 19:25:57.160334 1 cvc.go:98] cvc pvc-45bb9001-5a14-40b7-8caa-fcae972c31bc patched
I1223 19:25:57.166214 1 cstor_volume.go:442] Verifying the reconciliation of version for pvc-45bb9001-5a14-40b7-8caa-fcae972c31bc
I1223 19:26:07.181396 1 cstor_volume.go:77] Successfully upgraded pvc-45bb9001-5a14-40b7-8caa-fcae972c31bc to 3.6.0
After it’s done, all your pods in the openebs
namespace, should have pretty recent age:
> k get po -n openebs
NAME READY STATUS RESTARTS AGE
cspc-operator-6fc5b8546b-fkt9v 1/1 Running 0 57m
cstor-cspc-upgrade-r9427 0/1 Completed 0 28m
cstor-disk-pool-2qqg-6c466c744-j4vnh 3/3 Running 0 55m
cstor-disk-pool-5jtv-754c67547-8z7fm 3/3 Running 0 52m
cstor-disk-pool-jfz2-7f9678f6c8-r6ppv 3/3 Running 0 49m
cstor-volume-upgrade-w6798 0/1 Completed 0 13m
cvc-operator-85bd9d569b-l42dr 1/1 Running 0 57m
openebs-cstor-admission-server-9fc86bffb-b5kzp 1/1 Running 0 29m
openebs-cstor-csi-controller-0 6/6 Running 0 57m
openebs-cstor-csi-node-f569s 2/2 Running 0 57m
openebs-cstor-csi-node-jxtlx 2/2 Running 0 57m
openebs-cstor-csi-node-qvr2s 2/2 Running 0 56m
openebs-ndm-862tm 1/1 Running 0 57m
openebs-ndm-9fw4f 1/1 Running 0 57m
openebs-ndm-b7r7d 1/1 Running 0 57m
openebs-ndm-cluster-exporter-6bd669f97f-qnfsx 1/1 Running 0 57m
openebs-ndm-node-exporter-hk7g9 1/1 Running 0 57m
openebs-ndm-node-exporter-wbg6j 1/1 Running 0 57m
openebs-ndm-node-exporter-xvppn 1/1 Running 0 57m
openebs-ndm-operator-686446d979-7vmhr 1/1 Running 0 57m
pvc-087960e4-c718-4227-84f8-d3a6ca423295-target-c7bdc769-bq984 3/3 Running 0 12m
pvc-45bb9001-5a14-40b7-8caa-fcae972c31bc-target-6d987f5cd5tc2mg 3/3 Running 0 4m4s
pvc-82a81b9a-d585-48b9-9b58-dc3f5b3c2461-target-6bb98988c-d52mb 3/3 Running 0 9m13s
pvc-9670d96c-f1e2-4e6a-a223-25960cf87b44-target-5bcc4c59dc5mzc7 3/3 Running 0 7m56s
pvc-dc70575e-499f-4737-a148-e511453df233-target-5d6f64db86j8d7w 3/3 Running 0 5m21s
pvc-eacb5ff7-6c28-4ad5-88eb-e877eaa51b56-target-58fbdd855-nsmgt 3/3 Running 0 10m
pvc-fc6ec633-d0e1-420c-b64b-cfefe3a425bd-target-7b79b67bcb55b9l 3/3 Running 0 6m38s
Fix VolumeAttachment Issue
After the upgrade was done, I ran into another issue. One of the pods was stuck in ContainerCreating
state and when I described the pod I saw this error:
Warning FailedMount 42s (x8 over 106s) kubelet, nd MountVolume.MountDevice failed for volume "pvc-343ccf62-" : rpc error: code = Internal desc = cstorvolumeattachments.cstor.openebs.io "pvc-343ccf62-" not found
I doubl checked and the cva
didn’t exist:
k get cva -n openebs | grep pvc-343ccf62-
On the same node where the pod was stuck I checked out the logs of the csi-node
pod:
> k get po -n openebs -l role=openebs-cstor-csi -o wide | grep nd
openebs-cstor-csi-node-x5b78 2/2 Running 0 8h 192.168.1.53 nd <none> <none>
And for the logs:
> k logs -n openebs openebs-cstor-csi-node-x5b78 cstor-csi-plugin
...
time="2023-12-26T19:09:47Z" level=error msg="NodeUnPublishVolume: dir /var/lib/kubelet/pods/9883b6d2-ad22-41fc-a27d-910d876b4eb8/volumes/kubernetes.io~csi/pvc-fc6ec633-d0e1-420c-b64b-cfefe3a425bd/mount does not exist"
The directory was indeed missing, but I was surpised that the directory for the pvc existed at all. And in the directory it only had vol_data.json
file and nothing else. That felt like a left over directory that shouldn’t be there, so I deleted it:
rm -rf /var/lib/kubelet/pods/9883b6d2-ad22-41fc-a27d-910d876b4eb8/volumes/kubernetes.io~csi/pvc-fc6ec633-d0e1-420c-b64b-cfefe3a425bd
And I restarted the csi-node
pod:
k delete po -n openebs openebs-cstor-csi-node-x5b78
And then the pod came up without issues.