SQR-057: Using Velero to back up Kubernetes resources

  • Angelo Fausti

Latest Revision: 2021-04-09

1   Introduction

Velero is an open-source tool to backup Kubernetes resources, including persistent volumes. It is an excellent complement to Argo CD for disaster recovery and data migration scenarios.

With Argo CD, you can re-deploy your application and preserve the application state as long as the persistent volumes are preserved. If a persistent volume (or any data in a persistent volume) is lost, then a disaster recovery tool like Velero becomes essential.

Velero implements custom resources like Backup, Schedule, and Restore and includes controllers that process these custom resources to perform the related operations. You can back up or restore all Kubernetes objects in your cluster, or you can filter them by type, namespace, and/or label selectors.

Velero has a plugin architecture to work with different cloud providers. But it also supports backing up and restoring Kubernetes volumes using a free open source backup tool called restic.

This technote shows how to install and use Velero with the Velero Google Cloud Platform (GCP) plugin. The GCP plugin uses a Google Cloud Storage (GCS) bucket for the backup/restore location and Google Compute Engine (GCE) Snapshots to perform the persistent volume disks’ snapshots.

2   Installation

In this section, you will install Velero with the GCP plugin into an existing Google Kubernetes Engine (GKE) cluster.

2.1   Create a GCS bucket

Use the gsutil command-line tool to create a Cloud Storage bucket:

BUCKET=<bucket>
gsutil mb gs://$BUCKET/

2.2   Create an IAM service account

Use the gcloud command-line tool to create an IAM service account:

gcloud iam service-accounts create velero \
   --display-name "Velero service account"

2.3   Create a role with enough permissions for Velero

Get the project and service account email values and create a role with enough permissions for Velero:

PROJECT_ID=$(gcloud config get-value project)
SERVICE_ACCOUNT_EMAIL=$(gcloud iam service-accounts list \
   --filter="displayName:Velero service account" \
   --format 'value(email)')
ROLE_PERMISSIONS=(
   compute.disks.get
   compute.disks.create
   compute.disks.createSnapshot
   compute.snapshots.get
   compute.snapshots.create
   compute.snapshots.useReadOnly
   compute.snapshots.delete
   compute.zones.get
)
gcloud iam roles create velero.server \
   --project $PROJECT_ID \
   --title "Velero Server" \
   --permissions "$(IFS=","; echo "${ROLE_PERMISSIONS[*]}")"

Add an IAM policy binding to grant this role to Velero’s GCP service account and to the bucket:

gcloud projects add-iam-policy-binding $PROJECT_ID \
   --member serviceAccount:$SERVICE_ACCOUNT_EMAIL \
   --role projects/$PROJECT_ID/roles/velero.server
gsutil iam ch serviceAccount:$SERVICE_ACCOUNT_EMAIL:objectAdmin gs://${BUCKET}

2.4   Set permissions for Velero using Workload Identity

Enable Workload Identity on your GKE cluster:

CLUSTER=<cluster-name>
ZONE=$(gcloud config get-value compute/zone)
gcloud container clusters update $CLUSTER --zone=$ZONE \
   --workload-pool=$PROJECT_ID.svc.id.goog

Updated the existing node pools:

NODE_POOLS=<node-pools>
gcloud container node-pools update $NODE_POOLS \
   --zone=$ZONE \
   --cluster=$CLUSTER \
   --workload-metadata=GKE_METADATA

Add an IAM policy binding to grant Velero’s Kubernetes service account access to the GCP service account.

gcloud iam service-accounts add-iam-policy-binding \
   --role roles/iam.workloadIdentityUser \
   --member "serviceAccount:$PROJECT_ID.svc.id.goog[velero/velero]" \
   velero@$PROJECT_ID.iam.gserviceaccount.com

2.5   Install Velero server using the GCP plugin

Finally, install the Velero server in your GKE cluster using the Velero command-line client:

velero install \
   --provider gcp \
   --plugins velero/velero-plugin-for-gcp:v1.2.0 \
   --bucket $BUCKET \
   --no-secret \
   --sa-annotations iam.gke.io/gcp-service-account=velero@$PROJECT_ID.iam.gserviceaccount.com \
   --backup-location-config serviceAccount=velero@$PROJECT_ID.iam.gserviceaccount.com \
   --wait

3   Backup and snapshot storage locations

The Velero GCP plugin uses a GCS bucket to store backup and restore metadata, and the Kubernetes manifests for the resources included in the backup.

The persistent volume backup is performed by GCE disk snapshots.

Example:

$ velero backup-location get
NAME      PROVIDER   BUCKET/PREFIX        PHASE       LAST VALIDATED                  ACCESS MODE
default   gcp        backup-sandbox-efd   Available   2021-04-06 16:13:34 -0700 MST   ReadWrite

$ velero snapshot-location get
NAME      PROVIDER
default   gcp

Snapshots incrementally backup data from persistent disks.

4   Velero backup and schedule

For an on-demand backup use the velero backup command, and for an scheduled backup use the velero schedule command instead.

Example 1: Schedule a backup of the entire application namespace every day with an expiration time set to 30 days.

velero schedule create <schedule-name> \
   --schedule="@every 24h" \
   --include-namespaces <app-namespace> \
   --ttl 720h

Example 2: Schedule a backup of all persistent volumes in the cluster.

velero schedule create <schedule-name> \
   --schedule="@every 24h" \
   --include-resources persistentVolumes\
   --ttl 720h

Example 3: Backup resources matching a label selector.

velero backup create <backup-name> \
--selector <key>=<value>

5   Velero restore

You can use namespace mapping to restore the application to a different namespace.

velero restore create \
   --from-schedule <schedule-name> \
   --namespace-mappings <original-namespace>:<restored-namespace>

You can also filter resources during a restore:

velero restore create \
   --from-schedule <schedule-name> \
   --include-resources persistentvolumes

6   Disaster recovery

Scenario 1: You have lost a persistent volume.

Use Velero to restore the persistent volume from the back up.

Scenario 2: “User A” accidentally deletes data from an application, and “User B” writes data to the same application roughly at the same time.

In this situation, you cannot simply restore the persistent volume from the backup. However, you can use Velero to restore the entire application namespace to a different namespace, connect to the application and manually restore the lost data.

7   Data migration

Scenario 1: Migrate the application to another cluster.

Use Argo CD to deploy the application to the new cluster and use Velero to restore its previous state.

8   A practical example

Let us take Chronograf as an example of a stateful application to illustrate how to restore deleted data from a backup. In this context, delete data could be a Chronograf dashboard, a Chronograf organization, or any configuration saved in the Chronograf database.

What follows assumes that Velero Server is installed in your cluster, and that the backup bucket is properly configured as described above.

Create a Velero Schedule as follows:

velero schedule create chronograf \
   --schedule="@every 24h" \
   --include-namespaces chronograf \
   --ttl 720h
Schedule "chronograf" created successfully.
velero schedule get
NAME         STATUS    CREATED                         SCHEDULE      BACKUP TTL   LAST BACKUP   SELECTOR
chronograf   Enabled   2021-04-09 12:44:50 -0700 MST   @every 24h    720h0m0s     4s ago        <none>

Open the Chronograf application and simulate a disaster by deleting a dashboard.

Restore the Chronograf application from the backup, but to a different namespace, for example, chronograf-restored:

velero restore create \
   --from-schedule chronograf \
   --namespace-mappings chronograf:chronograf-restored
Restore request "chronograf-20210409131002" submitted successfully.
Run `velero restore describe chronograf-20210409131002` or `velero restore logs chronograf-20210409131002` for more details.

Use the following to disable authentication in the restored Chronograf application, otherwise, you’ll be redirected to the original Chronograf application URL after logging in.

kubectl set env deployments --all TOKEN_SECRET- GH_CLIENT_ID- GH_CLIENT_SECRET- GH_ORGS- -n chronograf-restored

Ensure the Chronograf Pod has restarted:

kubectl delete --all pods -n chronograf-restored

Connect to the restored Chronograf application:

kubectl port-forward -n chronograf-restored service/chronograf-chronograf 8000:80

Finally, export the deleted dashboard from the restored Chronograf application at http://localhost:8000.