Velero Series (4): Production Migration with Velero

This article was last updated on: February 7, 2024 pm

Velero

overview

objective

Pass velero tools, to achieve the following overall goals:

Specific namespaces are migrated between clusters B and A;

The specific objectives are:

Created on a B A cluster velero (Including.) restic )
backup Cluster B Specific namespaces : caseycui2020:
1. Backup resources - such as deployments, configmaps, etc.
  1. Before backing up, exclude specificssecretsyaml.
2. Back up volume data; (via restic)
  1. By “opt-in”, only specific pod volumes are backed up
Migrate specific namespaces to Cluster A : caseycui2020:
1. Migration resources - passincludeway, migrate only specific resources;
2. Migrate volume data. (via ResiC)

Installation

Create a Velero-specific credentials file in your local directory (credentials-velero):

Object storage using XSKY: (The company’s NetApp object storage is not compatible)
1
2
3
[default] aws_access_key_id = xxxxxxxxxxxxxxxxxxxxxxxx aws_secret_access_key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
(openshift) You need to create a namespace first: velero: oc new-project velero
By default, the user-dimensioned openshift namespace does not schedule pods on all nodes in the cluster.

To schedule namespaces on all nodes, a comment is required:
1
oc annotate namespace velero openshift.io/node-selector=""
This should be done before installing Velero.

Start the server and storage services. In the Velero directory, run:

velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.0.0 \
    --bucket velero \
    --secret-file ./credentials-velero \
    --use-restic \
    --use-volume-snapshots=true \
    --backup-location-config region="default",s3ForcePathStyle="true",s3Url="http://glacier.e-whisper.com",insecureSkipTLSVerify="true",signatureVersion="4" \
    --snapshot-location-config region="default"

The content created includes:

CustomResourceDefinition/backups.velero.io: attempting to create resource
CustomResourceDefinition/backups.velero.io: created
CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource
CustomResourceDefinition/backupstoragelocations.velero.io: created
CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource
CustomResourceDefinition/deletebackuprequests.velero.io: created
CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource
CustomResourceDefinition/downloadrequests.velero.io: created
CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource
CustomResourceDefinition/podvolumebackups.velero.io: created
CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource
CustomResourceDefinition/podvolumerestores.velero.io: created
CustomResourceDefinition/resticrepositories.velero.io: attempting to create resource
CustomResourceDefinition/resticrepositories.velero.io: created
CustomResourceDefinition/restores.velero.io: attempting to create resource
CustomResourceDefinition/restores.velero.io: created
CustomResourceDefinition/schedules.velero.io: attempting to create resource
CustomResourceDefinition/schedules.velero.io: created
CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource
CustomResourceDefinition/serverstatusrequests.velero.io: created
CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource
CustomResourceDefinition/volumesnapshotlocations.velero.io: created
Waiting for resources to be ready in cluster...
Namespace/velero: attempting to create resource
Namespace/velero: created
ClusterRoleBinding/velero: attempting to create resource
ClusterRoleBinding/velero: created
ServiceAccount/velero: attempting to create resource
ServiceAccount/velero: created
Secret/cloud-credentials: attempting to create resource
Secret/cloud-credentials: created
BackupStorageLocation/default: attempting to create resource
BackupStorageLocation/default: created
VolumeSnapshotLocation/default: attempting to create resource
VolumeSnapshotLocation/default: created
Deployment/velero: attempting to create resource
Deployment/velero: created
DaemonSet/restic: attempting to create resource
DaemonSet/restic: created
Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.

(openshift) willvelero ServiceAccount added toprivilegedSCC：

1	`$ oc adm policy add-scc-to-user privileged -z velero -n velero`

(openshift) For OpenShift version >= 4.1, modify DaemonSet yaml to requestprivilegedMode:

@@ -67,3 +67,5 @@ spec:
              value: /credentials/cloud
            - name: VELERO_SCRATCH_DIR
              value: /scratch
+          securityContext:
+            privileged: true

Or:

oc patch ds/restic \
  --namespace velero \
  --type json \
  -p '[{"op":"add","path":"/spec/template/spec/containers/0/securityContext","value": { "privileged": true}}]'

Backup - B cluster

Back up specific resources at the cluster level

1	`velero backup create <backup-name> --include-cluster-resources=true --include-resources deployments,configmaps`

View the backup

1	`velero backup describe YOUR_BACKUP_NAME`

Back up specific namespaces `caseycui2020`

Exclude specific resources

The label isvelero.io/exclude-from-backup=trueThe resource is not included in the backup, even if it contains matching selector labels.

In this way, there is no need for backupssecret and other resources throughvelero.io/exclude-from-backup=true label to exclude.

Excluded in this waysecretSome examples are as follows:

builder-dockercfg-jbnzr
default-token-lshh8
pipeline-token-xt645

Back up Pod Volume using restic

🐾 Note:

Under this namespace, the following 2 pod volumes also need to be backed up, but they are not yet officially used:

mycoreapphttptask-callback

mycoreapphttptaskservice-callback

Pass “Selective Enable” Way to make selective backups.

Run the following command for each pod that contains the volume you want to back up:

1
2

oc -n caseycui2020 annotate pod/<mybackendapp-pod-name> backup.velero.io/backup-volumes=jmx-exporter-agent,pinpoint-agent,my-mybackendapp-claim
oc -n caseycui2020 annotate pod/<elitegetrecservice-pod-name> backup.velero.io/backup-volumes=uploadfile

where volume name is the name of the volume in the container spec.

For example, for the following pods:

apiVersion: v1
kind: Pod
metadata:
  name: sample
  namespace: foo
spec:
  containers:
  - image: k8s.gcr.io/test-webserver
    name: test-webserver
    volumeMounts:
    - name: pvc-volume
      mountPath: /volume-1
    - name: emptydir-volume
      mountPath: /volume-2
  volumes:
  - name: pvc-volume
    persistentVolumeClaim:
      claimName: test-volume-claim
  - name: emptydir-volume
    emptyDir: {}

You should run:

1	`kubectl -n foo annotate pod/sample backup.velero.io/backup-volumes=pvc-volume,emptydir-volume`

If you use a controller to manage your pods, you can also provide this annotation in the pod template spec.

Backup and verification

Back up the namespace and its objects, and the pod volume with the associated annotation:

1 2	`# 生产 namespace velero backup create caseycui2020 --include-namespaces caseycui2020`

View the backup

1
2
3

velero backup describe YOUR_BACKUP_NAME
velero backup logs caseycui2020
oc -n velero get podvolumebackups -l velero.io/backup-name=caseycui2020 -o yaml

The results of the described view are as follows:

Name:         caseycui2020
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  velero.io/source-cluster-k8s-gitversion=v1.18.3+2cf11e2
              velero.io/source-cluster-k8s-major-version=1
              velero.io/source-cluster-k8s-minor-version=18+

Phase:  Completed

Errors:    0
Warnimybackendapp:  0

Namespaces:
  Included:  caseycui2020
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Velero-Native Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1.1.0

Started:    2020-10-21 09:28:16 +0800 CST
Completed:  2020-10-21 09:29:17 +0800 CST

Expiration:  2020-11-20 09:28:16 +0800 CST

Total items to be backed up:  591
Items backed up:              591

Velero-Native Snapshots: <none included>

Restic Backups (specify --details for more information):
  Completed:  3

Back up regularly

To create a regularly scheduled backup using a cron expression:

1	`velero schedule create caseycui2020-b-daily --schedule="0 3 * * *" --include-namespaces caseycui2020`

Alternatively, you can use some non-standard shorthand cron expressions:

1	`velero schedule create test-daily --schedule="@every 24h" --include-namespaces caseycui2020`

For more usage examples, see Cron packagedocumentation.

Cluster Migration - To Cluster A

use Backups and Restores

As long as you point each Velero instance to the same cloud object storage location, Velero can help you migrate resources from one cluster to another. This scenario assumes that your cluster is hosted by the same cloud provider. Note that Velero natively does not support migration of persistent volume snapshots across cloud providers. If you want to migrate volume data between cloud platforms, enable restic, which will back up the volume contents at the file system level.

(Cluster B) Suppose you are not already using Velero schedule Operations to checkpoint data requires that you first back up the entire cluster (replace as needed.)<BACKUP-NAME>）：
1
velero backup create <BACKUP-NAME>
The default backup retention period is expressed in TTL (validity period) and is 30 days (720 hours); You can use it--ttl <DURATION>The flag is changed as needed. For more information about backup expiration, see How velero works。

(Cluster A) configurationBackupStorageLocationsandVolumeSnapshotLocationspoint to Cluster 1 Where used, usedvelero backup-location createandvelero snapshot-location create. To ensure configurationBackupStorageLocationsFor read-only, pass byvelero backup-location createto use--access-mode=ReadOnly flag (because I only have one bucket, I don’t configure read-only). The following is installed in Cluster A, which is configured during installationBackupStorageLocationsandVolumeSnapshotLocations.

velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.0.0 \
    --bucket velero \
    --secret-file ./credentials-velero \
    --use-restic \
    --use-volume-snapshots=true \
    --backup-location-config region="default",s3ForcePathStyle="true",s3Url="http://glacier.e-whisper.com",insecureSkipTLSVerify="true",signatureVersion="4"\
    --snapshot-location-config region="default"

(Cluster A) Ensure that the Velero Backup object has been created. Velero resources are synced with backup files in cloud storage.
1
velero backup describe <BACKUP-NAME>
note: The default sync interval is 1 minute, so make sure to wait before checking. You can use the Velero server--backup-sync-periodFlag to configure this interval.

(Cluster A) Once it is confirmed that the correct backup now exists (<BACKUP-NAME>), you can restore everything using: (becausebackup In onlycaseycui2020A namespace, so restore is not needed--include-namespaces caseycui2020 for filtration)

velero restore create --from-backup caseycui2020 --include-resources buildconfigs.build.openshift.io,configmaps,deploymentconfigs.apps.openshift.io,imagestreams.image.openshift.io,imagestreamtags.image.openshift.io,imagetags.image.openshift.io,limitranges,namespaces,networkpolicies.networking.k8s.io,persistentvolumeclaims,prometheusrules.monitoring.coreos.com,resourcequotas,rolebindimybackendapp.authorization.openshift.io,rolebindimybackendapp.rbac.authorization.k8s.io,routes.route.openshift.io,secrets,servicemonitors.monitoring.coreos.com,services,templateinstances.template.openshift.io

Because later verifiedpersistentvolumeclaimstargetrestoreThere is a problem, so remove this PVC when using it later, and then find a way to solve it later:

velero restore create --from-backup caseycui2020 --include-resources buildconfigs.build.openshift.io,configmaps,deploymentconfigs.apps.openshift.io,imagestreams.image.openshift.io,imagestreamtags.image.openshift.io,imagetags.image.openshift.io,limitranges,namespaces,networkpolicies.networking.k8s.io,persistentvolumeclaims,prometheusrules.monitoring.coreos.com,resourcequotas,rolebindimybackendapp.authorization.openshift.io,rolebindimybackendapp.rbac.authorization.k8s.io,routes.route.openshift.io,secrets,servicemonitors.monitoring.coreos.com,services,templateinstances.template.openshift.io

Verify the 2 clusters

Check that the second cluster is working as expected:

(Cluster A) Run:

1	`velero restore get`

The results are as follows:

NAME                       BACKUP      STATUS            STARTED   COMPLETED   ERRORS   WARNImybackendapp   CREATED                         SELECTOR
caseycui2020-20201021102342   caseycui2020   Failed            <nil>     <nil>       0        0          2020-10-21 10:24:14 +0800 CST   <none>
caseycui2020-20201021103040   caseycui2020   PartiallyFailed   <nil>     <nil>       46       34         2020-10-21 10:31:12 +0800 CST   <none>
caseycui2020-20201021105848   caseycui2020   InProgress        <nil>     <nil>       0        0          2020-10-21 10:59:20 +0800 CST   <none>

Then run:

1 2	`velero restore describe <RESTORE-NAME-FROM-GET-COMMAND> oc -n velero get podvolumerestores -l velero.io/restore-name=YOUR_RESTORE_NAME -o yaml`

The results are as follows:

Name:         caseycui2020-20201021102342
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  InProgress

Started:    <n/a>
Completed:  <n/a>

Backup:  caseycui2020

Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>

Resources:
  Included:        buildconfigs.build.openshift.io, configmaps, deploymentconfigs.apps.openshift.io, imagestreams.image.openshift.io, imagestreamtags.image.openshift.io, imagetags.image.openshift.io, limitranges, namespaces, networkpolicies.networking.k8s.io, persistentvolumeclaims, prometheusrules.monitoring.coreos.com, resourcequotas, rolebindimybackendapp.authorization.openshift.io, rolebindimybackendapp.rbac.authorization.k8s.io, routes.route.openshift.io, secrets, servicemonitors.monitoring.coreos.com, services, templateinstances.template.openshift.io
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappimybackendapp:  <none>

Label selector:  <none>

Restore PVs:  auto

If you run into problems, make sure Velero is running in the same namespace in both clusters.

I ran into a problem here, that is, openshift, imagestream and imagetag, and then the corresponding image could not be pulled, and the container did not start.

The container did not start, and the podvolume did not recover successfully.

Name:         caseycui2020-20201021110424
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  PartiallyFailed (run 'velero restore logs caseycui2020-20201021110424' for more information)

Started:    <n/a>
Completed:  <n/a>

Warnimybackendapp:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    caseycui2020:  could not restore, imagetags.image.openshift.io "mybackendapp:1.0.0" already exists. Warning: the in-cluster version is different than the backed-up version.
                could not restore, imagetags.image.openshift.io "mybackendappno:1.0.0" already exists. Warning: the in-cluster version is different than the backed-up version.
                ...

Errors:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    caseycui2020:  error restoring imagestreams.image.openshift.io/caseycui2020/mybackendapp: ImageStream.image.openshift.io "mybackendapp" is invalid: []: Internal error: imagestreams "mybackendapp" is invalid: spec.tags[latest].from.name: Invalid value: "mybackendapp@sha256:6c5ab553a97c74ad602d2427a326124621c163676df91f7040b035fa64b533c7": error generating tag event: imagestreamimage.image.openshift.io ......

Backup:  caseycui2020

Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>

Resources:
  Included:        buildconfigs.build.openshift.io, configmaps, deploymentconfigs.apps.openshift.io, imagestreams.image.openshift.io, imagestreamtags.image.openshift.io, imagetags.image.openshift.io, limitranges, namespaces, networkpolicies.networking.k8s.io, persistentvolumeclaims, prometheusrules.monitoring.coreos.com, resourcequotas, rolebindimybackendapp.authorization.openshift.io, rolebindimybackendapp.rbac.authorization.k8s.io, routes.route.openshift.io, secrets, servicemonitors.monitoring.coreos.com, services, templateinstances.template.openshift.io
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappimybackendapp:  <none>

Label selector:  <none>

Restore PVs:  auto

Summary of migration issues

The current summary of the issues is as follows:

imagestreams.image.openshift.io, imagestreamtags.image.openshift.io, imagetags.image.openshift.io The image in was not imported successfully; ExactlylatestThis tag was not imported successfully. imagestreamtags.image.openshift.ioIt also takes time to take effect.
persistentvolumeclaims After migration, an error is reported, and the error is reported as follows:
1
phase: Lost
The reason is: the configuration of the StorageClass of cluster A and B is different, so the PVC of cluster B is impossible to bind directly in cluster A. Moreover, it cannot be directly modified after creation, and it needs to be deleted and recreated.
Routes Domain name, some domain names are domain names specific to A B cluster, such as: jenkins-caseycui2020.b.caas.e-whisper.comMigrate to Cluster A to: jenkins-caseycui2020.a.caas.e-whisper.com
podVolume Data is not migrated.

`latest`This tag was not imported successfully

To import manually, the command is as follows: (1.0.1 is the latest version of ImageStream)

1	`oc tag xxl-job-admin:1.0.1 xxl-job-admin:latest`

PVC phase Lost problem

If created manually, PVC yaml needs to be adjusted. The PVC before and after adjustment is as follows:

Cluster B original YAML:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  annotations:
    pv.kubernetes.io/bind-completed: 'yes'
    pv.kubernetes.io/bound-by-controller: 'yes'
    volume.beta.kubernetes.io/storage-provisioner: csi.trident.netapp.io
  selfLink: /api/v1/namespaces/caseycui2020/persistentvolumeclaims/jenkins
  resourceVersion: '77304786'
  name: jenkins
  uid: ffcabc42-845d-4cdf-8c7c-56e97cb5ea82
  creationTimestamp: '2020-10-21T03:05:46Z'
  managedFields:
    - manager: kube-controller-manager
      operation: Update
      apiVersion: v1
      time: '2020-10-21T03:05:46Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:status':
          'f:phase': {}
    - manager: velero-server
      operation: Update
      apiVersion: v1
      time: '2020-10-21T03:05:46Z'
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:pv.kubernetes.io/bind-completed': {}
            'f:pv.kubernetes.io/bound-by-controller': {}
            'f:volume.beta.kubernetes.io/storage-provisioner': {}
          'f:labels':
            .: {}
            'f:app': {}
            'f:template': {}
            'f:template.openshift.io/template-instance-owner': {}
            'f:velero.io/backup-name': {}
            'f:velero.io/restore-name': {}
        'f:spec':
          'f:accessModes': {}
          'f:resources':
            'f:requests':
              .: {}
              'f:storage': {}
          'f:storageClassName': {}
          'f:volumeMode': {}
          'f:volumeName': {}
  namespace: caseycui2020
  finalizers:
    - kubernetes.io/pvc-protection
  labels:
    app: jenkins-persistent
    template: jenkins-persistent-monitored
    template.openshift.io/template-instance-owner: 5a0b28c3-c760-451b-b92f-a781406d9e91
    velero.io/backup-name: caseycui2020
    velero.io/restore-name: caseycui2020-20201021110424
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  volumeName: pvc-414efafd-8b22-48da-8c20-6025a8e671ca
  storageClassName: nas-data
  volumeMode: Filesystem
status:
  phase: Lost

After adjustment:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: jenkins
  namespace: caseycui2020
  labels:
    app: jenkins-persistent
    template: jenkins-persistent-monitored
    template.openshift.io/template-instance-owner: 5a0b28c3-c760-451b-b92f-a781406d9e91
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 5Gi
  storageClassName: nas-data
  volumeMode: Filesystem

`podVolume` Data is not migrated

You can migrate manually, with the following command:

# 登录 B 集群 
# 先把 B 集群 /opt/prometheus 数据拿出来到当前文件夹 
oc rsync xxl-job-admin-5-9sgf7:/opt/prometheus .
# 上边 rsync 命令会创建个 prometheus 的目录 
cd prometheus
# 登录 A 集群 
# 再把数据拷贝进去 (拷贝之前得先确保这个 pod 启动起来) (可以先把 `JAVA_OPTS` 删掉)
oc rsync ./ xxl-job-admin-2-6k8df:/opt/prometheus/

summary

This article was written earlier, and OpenShift came out with a proprietary migration tool based on OpenShift wrapped in Velero, which can be migrated directly through the tools it provides.

In addition, there are many restrictions on OpenShift clusters, and there are also many resources exclusive to OpenShift, so the difference between actual use and standard K8S is still relatively large, and you need to pay careful attention.

Although the attempt failed, the ideas are still available for reference.

Series of articles

Velero series of articles

📚️ Reference documentation

Velero Docs - Resource filtering

CloudNative

#K8S #BestPractices #Storage #ObjectStorage #Velero #Backup #Restore #Migration #CSI

Velero Series (4): Production Migration with Velero

https://e-whisper.com/posts/49932/

Author

east4ming

Posted on

May 25, 2022

Licensed under

Velero series (5): Kubernetes cluster backup disaster recovery production best practices based on Velero Previous

Velero article series (3): Velero resource filtering Next

Velero Series (4): Production Migration with Velero

overview

objective

Installation

Backup - B cluster

Back up specific resources at the cluster level

Back up specific namespaces caseycui2020

Exclude specific resources

Back up Pod Volume using restic

Backup and verification

Back up regularly

Cluster Migration - To Cluster A

Verify the 2 clusters

Summary of migration issues

latestThis tag was not imported successfully

PVC phase Lost problem

podVolume Data is not migrated

summary

Series of articles

📚️ Reference documentation

Back up specific namespaces `caseycui2020`

`latest`This tag was not imported successfully

`podVolume` Data is not migrated