Velero and OpenEBS | Karim's Blog

I wanted to move one of my openEBS volumes from one cluster to another and I realized I could do it by performing a backup and restore with velero. I deciced to use Google Cloud Storage for the backup storage. So let’s configure that first.

Configure GCP

From Plugins for Google Cloud Platform (GCP). Let’s get the right service account:

PROJECT_ID=$(gcloud config get-value project)
BUCKET="YOUR_BUCKET"

# create account
GSA_NAME=velero
gcloud iam service-accounts create $GSA_NAME \
    --display-name "Velero service account"

# get email of sa
SERVICE_ACCOUNT_EMAIL=$(gcloud iam service-accounts list \
  --filter="displayName:Velero service account" \
  --format 'value(email)')

# Create custom role
ROLE_PERMISSIONS=(
    compute.disks.get
    compute.disks.create
    compute.disks.createSnapshot
    compute.snapshots.get
    compute.snapshots.create
    compute.snapshots.useReadOnly
    compute.snapshots.delete
    compute.zones.get
    storage.objects.create
    storage.objects.delete
    storage.objects.get
    storage.objects.list
)

gcloud iam roles create velero.server \
    --project $PROJECT_ID \
    --title "Velero Server" \
    --permissions "$(IFS=","; echo "${ROLE_PERMISSIONS[*]}")"

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member serviceAccount:$SERVICE_ACCOUNT_EMAIL \
    --role projects/$PROJECT_ID/roles/velero.server

gsutil iam ch serviceAccount:$SERVICE_ACCOUNT_EMAIL:objectAdmin gs://${BUCKET}

# export key
gcloud iam service-accounts keys create credentials-velero \
    --iam-account $SERVICE_ACCOUNT_EMAIL

Now let’s do the install

$ wget https://github.com/vmware-tanzu/velero/releases/download/v1.9.2/velero-v1.9.2-linux-amd64.tar.gz
$ tar xvzf velero-v1.9.2-linux-amd64.tar.gz
$ mv velero-v1.9.2-linux-amd64/velero /usr/local/bin/.
$ velero install \
    --provider gcp \
    --plugins velero/velero-plugin-for-gcp:v1.5.0 \
    --bucket $BUCKET \
    --secret-file ./credentials-velero

Initially I tried using the default CSI plugin but the backup didn’t work so I kep going without it (From CSI Support):

--features=EnableCSI

At this point you should see a successful connection to GCP:

> velero backup-location get
NAME      PROVIDER   BUCKET/PREFIX      PHASE       LAST VALIDATED                 ACCESS MODE   DEFAULT
default   gcp        $BUCKET            Available   2022-10-26 MDT   ReadWrite     true

Install OpenEBS Plugin

From Velero-plugin for OpenEBS CStor volume. First let’s enable the plugin:

velero plugin add openebs/velero-plugin:1.11.0

Then let’s configure the snapshot location, example is available here:

> cat volume-snapshot.yaml
apiVersion: velero.io/v1
kind: VolumeSnapshotLocation
metadata:
  name: gcp-bucket
  namespace: velero
spec:
  provider: openebs.io/cstor-blockstore
  config:
    bucket: $BUCKET
    prefix: oebs
    provider: gcp
    restApiTimeout: 1m
    namespace: openebs
    autoSetTargetIP: "true"

Now let’s take a backup including PVCs:

velero backup create postges-backup -l app=postgres --snapshot-volumes --volume-snapshot-locations=gcp-bucket

If all is well you should see the backup:

> velero backup describe postges-backup
Name:         postges-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.25.3
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=25

Phase:  Completed
Errors:    0
Warnings:  39

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  app=postgres
Storage Location:  default
Velero-Native Snapshot PVs:  true
TTL:  720h0m0s
Hooks:  <none>
Backup Format Version:  1.1.0
Started:    2022-10-26 12:43:23 -0600 MDT
Completed:  2022-10-26 12:45:38 -0600 MDT
Expiration:  2022-11-25 11:43:23 -0700 MST
Total items to be backed up:  11
Items backed up:              11
Velero-Native Snapshots:  1 of 1 snapshots completed successfully (specify --details for more information)
CSI Volume Snapshots: <none included>

If you add --details, you will information about the snapshot:

> velero backup describe postges-backup --details
Name:         postges-backup
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.25.3
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=25

Phase:  Completed

Errors:    0
Warnings:  39

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto
Label selector:  app=postgres
Storage Location:  default
Velero-Native Snapshot PVs:  true
TTL:  720h0m0s
Hooks:  <none>
Backup Format Version:  1.1.0
Started:    2022-10-26 12:43:23 -0600 MDT
Completed:  2022-10-26 12:45:38 -0600 MDT
Expiration:  2022-11-25 11:43:23 -0700 MST
Total items to be backed up:  11
Items backed up:              11
Resource List:
  apps/v1/ReplicaSet:
    - default/postgres-54bc4544cc
    - default/postgres-5c46bff6f7
    - default/postgres-5d96dd444b
    - default/postgres-7fc7bbf7f6
    - default/postgres-868f4bdc9b
  discovery.k8s.io/v1/EndpointSlice:
    - default/postgres-mbw4d
  v1/Endpoints:
    - default/postgres
  v1/PersistentVolume:
    - pvc-1841a854-439f-423e-a54c-bf402128689b
  v1/PersistentVolumeClaim:
    - default/postgres-data-pvc
  v1/Pod:
    - default/postgres-5d96dd444b-d2z2h
  v1/Service:
    - default/postgres

Velero-Native Snapshots:
  pvc-1841a854-439f-423e-a54c-bf402128689b:
    Snapshot ID:        pvc-1841a854-439f-423e-a54c-bf402128689b-velero-bkp-postges-backup
    Type:               cstor-snapshot
    Availability Zone:
    IOPS:               <N/A>

CSI Volume Snapshots: <none included>

Restoring a backup on another cluster

I deployed velero the same way on another cluster:

## install velero
$ export BUCKET=staging.grand-drive-196322.appspot.com
$ velero install \
    --provider gcp \
    --plugins velero/velero-plugin-for-gcp:v1.5.0 \
    --bucket $BUCKET \
    --secret-file ./credentials-velero \
    --features=EnableCSI
## install the openebs plugin
$ velero plugin add openebs/velero-plugin:1.11.0
## Configure the VolumeStapshotLocation
kubectl apply -f volume-snapshot.yaml

Then I saw the backups from that cluster:

> velero backup get
NAME              STATUS      ERRORS   WARNINGS   CREATED                        EXPIRES   STORAGE LOCATION   SELECTOR
postges-backup   Completed   0        39         2022-10-26 12:43:23 -0600 MDT   29d       default            app=postgres

Now let’s try the restore:

> velero restore create --from-backup postges-backup --restore-volumes=true

Initially the restore will show up as partially failed:

> velero restore get
NAME                            BACKUP           STATUS            STARTED                         COMPLETED                       ERRORS   WARNINGS   CREATED                         SELECTOR
postges-backup-20221026143436   postges-backup   PartiallyFailed   2022-10-26 14:34:36 -0600 MDT   2022-10-26 14:40:31 -0600 MDT   1        1          2022-10-26 14:34:36 -0600 MDT   <none>

And this is expected since there is a known issue with the openebs restore and you have to manually set the targetIP (this is covered in Setting targetip in replica). First get the PV name:

> PV_NAME=$(k get  pv -o json | jq -r '.items[]| select( .spec.claimRef.name | contains("postgres")).metadata.name')
> echo $PV_NAME
pvc-dc70575e-499f-4737-a148-e511453df233

Then get the target IP:

> TAR_IP=$(kubectl get svc -n openebs $PV_NAME -ojsonpath='{.spec.clusterIP}')
> echo $TAR_IP
10.233.35.19

Next get the Disk Pools serving the PVC:

> DISK_POOLS=( $(for r in $(k get cvr -n openebs -o json | jq -r ".items[]| select( .metadata.name | contains(\"$PV_NAME\"))".metadata.name); do echo $r| sed s/$PV_NAME-//; done) )
> echo $DISK_POOLS
cstor-disk-pool-2qqg cstor-disk-pool-jfz2

Now let’s get the corresponding pods for those pool instances:

> POOL_PODS=( $(for dp in $DISK_POOLS; do k get pods -n openebs -o json | jq -r ".items[]| select( .metadata.name | contains(\"$dp\"))".metadata.name; done) )
> echo $POOL_PODS
cstor-disk-pool-2qqg-7857745dc-mjjn2 cstor-disk-pool-jfz2-68c9786bd7-4qmxt

Next you can check all the pods to see which disk don’t have an IP set for the target:

> for pool_pod in $POOL_PODS; do echo $pool_pod; kubectl -n openebs exec -it $pool_pod -c cstor-pool -- bash -c 'zfs get io.openebs:targetip';  done
cstor-disk-pool-2qqg-7857745dc-mjjn2
NAME                                                                                                PROPERTY             VALUE          SOURCE
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2                                                          io.openebs:targetip  -              -
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-087960e4-c718-4227-84f8-d3a6ca423295                 io.openebs:targetip  10.233.30.198  local
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-45bb9001-5a14-40b7-8caa-fcae972c31bc                 io.openebs:targetip  10.233.60.140  local
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-9670d96c-f1e2-4e6a-a223-25960cf87b44                 io.openebs:targetip  10.233.33.71   local
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-dc70575e-499f-4737-a148-e511453df233                 io.openebs:targetip                
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-dc70575e-499f-4737-a148-e511453df233@postges-backup  io.openebs:targetip  -              -
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-eacb5ff7-6c28-4ad5-88eb-e877eaa51b56                 io.openebs:targetip  10.233.27.53   local
cstor-disk-pool-jfz2-68c9786bd7-4qmxt
NAME                                                                                                PROPERTY             VALUE          SOURCE
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2                                                          io.openebs:targetip  -              -
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-087960e4-c718-4227-84f8-d3a6ca423295                 io.openebs:targetip  10.233.30.198  local
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-1c2664d4-a7e6-4182-8535-1cfad2ec6c0c                 io.openebs:targetip  10.233.52.183  local
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-82a81b9a-d585-48b9-9b58-dc3f5b3c2461                 io.openebs:targetip  10.233.48.159  local
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-9670d96c-f1e2-4e6a-a223-25960cf87b44                 io.openebs:targetip  10.233.33.71   local
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-dc70575e-499f-4737-a148-e511453df233                 io.openebs:targetip  
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-dc70575e-499f-4737-a148-e511453df233@postges-backup  io.openebs:targetip  -              -
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-eacb5ff7-6c28-4ad5-88eb-e877eaa51b56                 io.openebs:targetip  10.233.27.53   local

You could also run the following to check all the disk pools:

> for pool_pod in $(k get pods -l openebs.io/cstor-pool-cluster=cstor-disk-pool -n openebs -o name); do echo $pool_pod; kubectl -n openebs exec -it $pool_pod -c cstor-pool -- bash -c 'zfs get io.openebs:targetip';  done

Since I have two replica enabled for my openebs configurations that’s why we only see two PVCs with the missing target IP. So now let’s update all the PVCs:

> for pp in $POOL_PODS; do echo $pp; for pvc in $(k exec $pp -c cstor-pool -n openebs -- bash -c "zfs get io.openebs:targetip" | grep io.openebs:targetip | grep $PV_NAME | grep -v '@' | cut -d" " -f1); do echo $pvc; kubectl exec $pp -c cstor-pool -n openebs -- bash -c "zfs set io.openebs:targetip=$TAR_IP $pvc"; done ; done
cstor-disk-pool-2qqg-7857745dc-mjjn2
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-dc70575e-499f-4737-a148-e511453df233
cstor-disk-pool-jfz2-68c9786bd7-4qmxt
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-dc70575e-499f-4737-a148-e511453df233

To confirm it worked, we can run the following:

> for pp in $POOL_PODS; do echo $pp; for pvc in $(k exec $pp -c cstor-pool -n openebs -- bash -c "zfs get io.openebs:targetip" | grep io.openebs:targetip | grep $PV_NAME | grep -v '@' | cut -d" " -f1); do k exec $pp -c cstor-pool -n openebs -- bash -c "zfs get io.openebs:targetip $pvc" ;  done ; done
cstor-disk-pool-2qqg-7857745dc-mjjn2
NAME                                                                                 PROPERTY             VALUE         SOURCE
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-dc70575e-499f-4737-a148-e511453df233  io.openebs:targetip  10.233.35.19  local
cstor-disk-pool-jfz2-68c9786bd7-4qmxt
NAME                                                                                 PROPERTY             VALUE         SOURCE
cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-dc70575e-499f-4737-a148-e511453df233  io.openebs:targetip  10.233.35.19  local

After that you will see the your pod be able to use the PVC:

openebs     7m32s       Warning   SyncFailed         cstorvolumereplica/pvc-dc70575e-499f-4737-a148-e511453df233-cstor-disk-pool-2qqg   failed to sync CVR error: unable to update snapshot list details in CVR: failed to get the list of snapshots: Output: failed listsnap command for cstor-adfe63f7-5c10-425d-801a-1b299bfa3dc2/pvc-dc70575e-499f-4737-a148-e511453df233 with err 2...
openebs     6m50s       Normal    Degraded           cstorvolume/pvc-dc70575e-499f-4737-a148-e511453df233                               Volume is in Degraded state
openebs     6m20s       Normal    Healthy            cstorvolume/pvc-dc70575e-499f-4737-a148-e511453df233                               Volume is in Healthy state
default     112s        Normal    Scheduled          pod/postgres-5d96dd444b-cpldr                                                      Successfully assigned default/postgres-5d96dd444b-cpldr to nc
default     112s        Normal    SuccessfulCreate   replicaset/postgres-5d96dd444b                                                     Created pod: postgres-5d96dd444b-cpldr

And it will marked as used by our pod:

> k describe pvc postgres-data-pvc
Name:          postgres-data-pvc
Namespace:     default
StorageClass:  cstor-csi-disk
Status:        Bound
Volume:        pvc-dc70575e-499f-4737-a148-e511453df233
Labels:        app.kubernetes.io/instance=postgres-home
Annotations:   pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: cstor.csi.openebs.io
               volume.kubernetes.io/storage-provisioner: cstor.csi.openebs.io
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      5Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Used By:       postgres-5d96dd444b-cpldr
Events:        <none>

Uninstall Velero

If you ever need to uninstall we can follow the instructions from Uninstalling Velero:

kubectl delete namespace/velero clusterrolebinding/velero
kubectl delete crds -l component=velero