Velero Series (4): Production Migration with Velero

This article was last updated on: July 24, 2024 am

Velero

overview

objective

Pass velero tools, to achieve the following overall goals:

  • Specific namespaces are migrated between clusters B and A;

The specific objectives are:

  1. Created on a B A cluster velero (Including.) restic )
  2. backup Cluster B Specific namespaces : caseycui2020:
    1. Backup resources - such as deployments, configmaps, etc.
      1. Before backing up, exclude specificssecretsyaml.
    2. Back up volume data; (via restic)
      1. By “opt-in”, only specific pod volumes are backed up
  3. Migrate specific namespaces to Cluster A : caseycui2020:
    1. Migration resources - passincludeway, migrate only specific resources;
    2. Migrate volume data. (via ResiC)

Installation

  1. Create a Velero-specific credentials file in your local directory (credentials-velero):

    Object storage using XSKY: (The company’s NetApp object storage is not compatible)

    1
    2
    3
    [default]
    aws_access_key_id = xxxxxxxxxxxxxxxxxxxxxxxx
    aws_secret_access_key = xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  2. (openshift) You need to create a namespace first: velero: oc new-project velero

  3. By default, the user-dimensioned openshift namespace does not schedule pods on all nodes in the cluster.

    To schedule namespaces on all nodes, a comment is required:

    1
    oc annotate namespace velero openshift.io/node-selector=""

    This should be done before installing Velero.

  4. Start the server and storage services. In the Velero directory, run:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.0.0 \
    --bucket velero \
    --secret-file ./credentials-velero \
    --use-restic \
    --use-volume-snapshots=true \
    --backup-location-config region="default",s3ForcePathStyle="true",s3Url="http://glacier.e-whisper.com",insecureSkipTLSVerify="true",signatureVersion="4" \
    --snapshot-location-config region="default"

    The content created includes:

    CustomResourceDefinition/backups.velero.io: attempting to create resource
    CustomResourceDefinition/backups.velero.io: created
    CustomResourceDefinition/backupstoragelocations.velero.io: attempting to create resource
    CustomResourceDefinition/backupstoragelocations.velero.io: created
    CustomResourceDefinition/deletebackuprequests.velero.io: attempting to create resource
    CustomResourceDefinition/deletebackuprequests.velero.io: created
    CustomResourceDefinition/downloadrequests.velero.io: attempting to create resource
    CustomResourceDefinition/downloadrequests.velero.io: created
    CustomResourceDefinition/podvolumebackups.velero.io: attempting to create resource
    CustomResourceDefinition/podvolumebackups.velero.io: created
    CustomResourceDefinition/podvolumerestores.velero.io: attempting to create resource
    CustomResourceDefinition/podvolumerestores.velero.io: created
    CustomResourceDefinition/resticrepositories.velero.io: attempting to create resource
    CustomResourceDefinition/resticrepositories.velero.io: created
    CustomResourceDefinition/restores.velero.io: attempting to create resource
    CustomResourceDefinition/restores.velero.io: created
    CustomResourceDefinition/schedules.velero.io: attempting to create resource
    CustomResourceDefinition/schedules.velero.io: created
    CustomResourceDefinition/serverstatusrequests.velero.io: attempting to create resource
    CustomResourceDefinition/serverstatusrequests.velero.io: created
    CustomResourceDefinition/volumesnapshotlocations.velero.io: attempting to create resource
    CustomResourceDefinition/volumesnapshotlocations.velero.io: created
    Waiting for resources to be ready in cluster...
    Namespace/velero: attempting to create resource
    Namespace/velero: created
    ClusterRoleBinding/velero: attempting to create resource
    ClusterRoleBinding/velero: created
    ServiceAccount/velero: attempting to create resource
    ServiceAccount/velero: created
    Secret/cloud-credentials: attempting to create resource
    Secret/cloud-credentials: created
    BackupStorageLocation/default: attempting to create resource
    BackupStorageLocation/default: created
    VolumeSnapshotLocation/default: attempting to create resource
    VolumeSnapshotLocation/default: created
    Deployment/velero: attempting to create resource
    Deployment/velero: created
    DaemonSet/restic: attempting to create resource
    DaemonSet/restic: created
    Velero is installed! ⛵ Use 'kubectl logs deployment/velero -n velero' to view the status.
    
  5. (openshift) willvelero ServiceAccount added toprivilegedSCC:

    1
    $ oc adm policy add-scc-to-user privileged -z velero -n velero
  6. (openshift) For OpenShift version >= 4.1, modify DaemonSet yaml to requestprivilegedMode:

    1
    2
    3
    4
    5
    6
    @@ -67,3 +67,5 @@ spec:
    value: /credentials/cloud
    - name: VELERO_SCRATCH_DIR
    value: /scratch
    + securityContext:
    + privileged: true

    Or:

    1
    2
    3
    4
    oc patch ds/restic \
    --namespace velero \
    --type json \
    -p '[{"op":"add","path":"/spec/template/spec/containers/0/securityContext","value": { "privileged": true}}]'

Backup - B cluster

Back up specific resources at the cluster level

1
velero backup create <backup-name> --include-cluster-resources=true  --include-resources deployments,configmaps

View the backup

1
velero backup describe YOUR_BACKUP_NAME

Back up specific namespaces caseycui2020

Exclude specific resources

The label isvelero.io/exclude-from-backup=trueThe resource is not included in the backup, even if it contains matching selector labels.

In this way, there is no need for backupssecret and other resources throughvelero.io/exclude-from-backup=true label to exclude.

Excluded in this waysecretSome examples are as follows:

builder-dockercfg-jbnzr
default-token-lshh8
pipeline-token-xt645

Back up Pod Volume using restic

🐾 Note:

Under this namespace, the following 2 pod volumes also need to be backed up, but they are not yet officially used:

  • mycoreapphttptask-callback
  • mycoreapphttptaskservice-callback

Pass “Selective Enable” Way to make selective backups.

  1. Run the following command for each pod that contains the volume you want to back up:

    1
    2
    oc -n caseycui2020 annotate pod/<mybackendapp-pod-name> backup.velero.io/backup-volumes=jmx-exporter-agent,pinpoint-agent,my-mybackendapp-claim
    oc -n caseycui2020 annotate pod/<elitegetrecservice-pod-name> backup.velero.io/backup-volumes=uploadfile

    where volume name is the name of the volume in the container spec.

    For example, for the following pods:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    apiVersion: v1
    kind: Pod
    metadata:
    name: sample
    namespace: foo
    spec:
    containers:
    - image: k8s.gcr.io/test-webserver
    name: test-webserver
    volumeMounts:
    - name: pvc-volume
    mountPath: /volume-1
    - name: emptydir-volume
    mountPath: /volume-2
    volumes:
    - name: pvc-volume
    persistentVolumeClaim:
    claimName: test-volume-claim
    - name: emptydir-volume
    emptyDir: {}

    You should run:

    1
    kubectl -n foo annotate pod/sample backup.velero.io/backup-volumes=pvc-volume,emptydir-volume

    If you use a controller to manage your pods, you can also provide this annotation in the pod template spec.

Backup and verification

Back up the namespace and its objects, and the pod volume with the associated annotation:

1
2
# 生产 namespace 
velero backup create caseycui2020 --include-namespaces caseycui2020

View the backup

1
2
3
velero backup describe YOUR_BACKUP_NAME
velero backup logs caseycui2020
oc -n velero get podvolumebackups -l velero.io/backup-name=caseycui2020 -o yaml

The results of the described view are as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Name:         caseycui2020
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: velero.io/source-cluster-k8s-gitversion=v1.18.3+2cf11e2
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=18+

Phase: Completed

Errors: 0
Warnimybackendapp: 0

Namespaces:
Included: caseycui2020
Excluded: <none>

Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto

Label selector: <none>

Storage Location: default

Velero-Native Snapshot PVs: auto

TTL: 720h0m0s

Hooks: <none>

Backup Format Version: 1.1.0

Started: 2020-10-21 09:28:16 +0800 CST
Completed: 2020-10-21 09:29:17 +0800 CST

Expiration: 2020-11-20 09:28:16 +0800 CST

Total items to be backed up: 591
Items backed up: 591

Velero-Native Snapshots: <none included>

Restic Backups (specify --details for more information):
Completed: 3

Back up regularly

To create a regularly scheduled backup using a cron expression:

1
velero schedule create caseycui2020-b-daily --schedule="0 3 * * *" --include-namespaces caseycui2020

Alternatively, you can use some non-standard shorthand cron expressions:

1
velero schedule create test-daily --schedule="@every 24h" --include-namespaces caseycui2020

For more usage examples, see Cron packagedocumentation.

Cluster Migration - To Cluster A

use Backups and Restores

As long as you point each Velero instance to the same cloud object storage location, Velero can help you migrate resources from one cluster to another. This scenario assumes that your cluster is hosted by the same cloud provider. Note that Velero natively does not support migration of persistent volume snapshots across cloud providers. If you want to migrate volume data between cloud platforms, enable restic, which will back up the volume contents at the file system level.

  1. (Cluster B) Suppose you are not already using Velero schedule Operations to checkpoint data requires that you first back up the entire cluster (replace as needed.)<BACKUP-NAME>):

    1
    velero backup create <BACKUP-NAME>

    The default backup retention period is expressed in TTL (validity period) and is 30 days (720 hours); You can use it--ttl <DURATION>The flag is changed as needed. For more information about backup expiration, see How velero works

  2. (Cluster A) configurationBackupStorageLocationsandVolumeSnapshotLocationspoint to Cluster 1 Where used, usedvelero backup-location createandvelero snapshot-location create. To ensure configurationBackupStorageLocationsFor read-only, pass byvelero backup-location createto use--access-mode=ReadOnly flag (because I only have one bucket, I don’t configure read-only). The following is installed in Cluster A, which is configured during installationBackupStorageLocationsandVolumeSnapshotLocations.

    1
    2
    3
    4
    5
    6
    7
    8
    9
    velero install \
    --provider aws \
    --plugins velero/velero-plugin-for-aws:v1.0.0 \
    --bucket velero \
    --secret-file ./credentials-velero \
    --use-restic \
    --use-volume-snapshots=true \
    --backup-location-config region="default",s3ForcePathStyle="true",s3Url="http://glacier.e-whisper.com",insecureSkipTLSVerify="true",signatureVersion="4"\
    --snapshot-location-config region="default"
  3. (Cluster A) Ensure that the Velero Backup object has been created. Velero resources are synced with backup files in cloud storage.

    1
    velero backup describe <BACKUP-NAME>

    note: The default sync interval is 1 minute, so make sure to wait before checking. You can use the Velero server--backup-sync-periodFlag to configure this interval.

  4. (Cluster A) Once it is confirmed that the correct backup now exists (<BACKUP-NAME>), you can restore everything using: (becausebackup In onlycaseycui2020A namespace, so restore is not needed--include-namespaces caseycui2020 for filtration)

    1
    velero restore create --from-backup caseycui2020 --include-resources buildconfigs.build.openshift.io,configmaps,deploymentconfigs.apps.openshift.io,imagestreams.image.openshift.io,imagestreamtags.image.openshift.io,imagetags.image.openshift.io,limitranges,namespaces,networkpolicies.networking.k8s.io,persistentvolumeclaims,prometheusrules.monitoring.coreos.com,resourcequotas,rolebindimybackendapp.authorization.openshift.io,rolebindimybackendapp.rbac.authorization.k8s.io,routes.route.openshift.io,secrets,servicemonitors.monitoring.coreos.com,services,templateinstances.template.openshift.io

    Because later verifiedpersistentvolumeclaimstargetrestoreThere is a problem, so remove this PVC when using it later, and then find a way to solve it later:

    1
    velero restore create --from-backup caseycui2020 --include-resources buildconfigs.build.openshift.io,configmaps,deploymentconfigs.apps.openshift.io,imagestreams.image.openshift.io,imagestreamtags.image.openshift.io,imagetags.image.openshift.io,limitranges,namespaces,networkpolicies.networking.k8s.io,persistentvolumeclaims,prometheusrules.monitoring.coreos.com,resourcequotas,rolebindimybackendapp.authorization.openshift.io,rolebindimybackendapp.rbac.authorization.k8s.io,routes.route.openshift.io,secrets,servicemonitors.monitoring.coreos.com,services,templateinstances.template.openshift.io

Verify the 2 clusters

Check that the second cluster is working as expected:

  1. (Cluster A) Run:

    1
    velero restore get

    The results are as follows:

    NAME                       BACKUP      STATUS            STARTED   COMPLETED   ERRORS   WARNImybackendapp   CREATED                         SELECTOR
    caseycui2020-20201021102342   caseycui2020   Failed            <nil>     <nil>       0        0          2020-10-21 10:24:14 +0800 CST   <none>
    caseycui2020-20201021103040   caseycui2020   PartiallyFailed   <nil>     <nil>       46       34         2020-10-21 10:31:12 +0800 CST   <none>
    caseycui2020-20201021105848   caseycui2020   InProgress        <nil>     <nil>       0        0          2020-10-21 10:59:20 +0800 CST   <none>
    
  2. Then run:

    1
    2
    velero restore describe <RESTORE-NAME-FROM-GET-COMMAND>
    oc -n velero get podvolumerestores -l velero.io/restore-name=YOUR_RESTORE_NAME -o yaml

    The results are as follows:

    Name:         caseycui2020-20201021102342
    Namespace:    velero
    Labels:       <none>
    Annotations:  <none>
    
    Phase:  InProgress
    
    Started:    <n/a>
    Completed:  <n/a>
    
    Backup:  caseycui2020
    
    Namespaces:
      Included:  all namespaces found in the backup
      Excluded:  <none>
    
    Resources:
      Included:        buildconfigs.build.openshift.io, configmaps, deploymentconfigs.apps.openshift.io, imagestreams.image.openshift.io, imagestreamtags.image.openshift.io, imagetags.image.openshift.io, limitranges, namespaces, networkpolicies.networking.k8s.io, persistentvolumeclaims, prometheusrules.monitoring.coreos.com, resourcequotas, rolebindimybackendapp.authorization.openshift.io, rolebindimybackendapp.rbac.authorization.k8s.io, routes.route.openshift.io, secrets, servicemonitors.monitoring.coreos.com, services, templateinstances.template.openshift.io
      Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
      Cluster-scoped:  auto
    
    Namespace mappimybackendapp:  <none>
    
    Label selector:  <none>
    
    Restore PVs:  auto
    

If you run into problems, make sure Velero is running in the same namespace in both clusters.

I ran into a problem here, that is, openshift, imagestream and imagetag, and then the corresponding image could not be pulled, and the container did not start.

The container did not start, and the podvolume did not recover successfully.

Name:         caseycui2020-20201021110424
Namespace:    velero
Labels:       <none>
Annotations:  <none>

Phase:  PartiallyFailed (run 'velero restore logs caseycui2020-20201021110424' for more information)

Started:    <n/a>
Completed:  <n/a>

Warnimybackendapp:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    caseycui2020:  could not restore, imagetags.image.openshift.io "mybackendapp:1.0.0" already exists. Warning: the in-cluster version is different than the backed-up version.
                could not restore, imagetags.image.openshift.io "mybackendappno:1.0.0" already exists. Warning: the in-cluster version is different than the backed-up version.
                ...

Errors:
  Velero:     <none>
  Cluster:    <none>
  Namespaces:
    caseycui2020:  error restoring imagestreams.image.openshift.io/caseycui2020/mybackendapp: ImageStream.image.openshift.io "mybackendapp" is invalid: []: Internal error: imagestreams "mybackendapp" is invalid: spec.tags[latest].from.name: Invalid value: "mybackendapp@sha256:6c5ab553a97c74ad602d2427a326124621c163676df91f7040b035fa64b533c7": error generating tag event: imagestreamimage.image.openshift.io ......

Backup:  caseycui2020

Namespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>

Resources:
  Included:        buildconfigs.build.openshift.io, configmaps, deploymentconfigs.apps.openshift.io, imagestreams.image.openshift.io, imagestreamtags.image.openshift.io, imagetags.image.openshift.io, limitranges, namespaces, networkpolicies.networking.k8s.io, persistentvolumeclaims, prometheusrules.monitoring.coreos.com, resourcequotas, rolebindimybackendapp.authorization.openshift.io, rolebindimybackendapp.rbac.authorization.k8s.io, routes.route.openshift.io, secrets, servicemonitors.monitoring.coreos.com, services, templateinstances.template.openshift.io
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
  Cluster-scoped:  auto

Namespace mappimybackendapp:  <none>

Label selector:  <none>

Restore PVs:  auto

Summary of migration issues

The current summary of the issues is as follows:

  1. imagestreams.image.openshift.io, imagestreamtags.image.openshift.io, imagetags.image.openshift.io The image in was not imported successfully; ExactlylatestThis tag was not imported successfully. imagestreamtags.image.openshift.ioIt also takes time to take effect.

  2. persistentvolumeclaims After migration, an error is reported, and the error is reported as follows:

    1
    phase: Lost

    The reason is: the configuration of the StorageClass of cluster A and B is different, so the PVC of cluster B is impossible to bind directly in cluster A. Moreover, it cannot be directly modified after creation, and it needs to be deleted and recreated.

  3. Routes Domain name, some domain names are domain names specific to A B cluster, such as: jenkins-caseycui2020.b.caas.e-whisper.comMigrate to Cluster A to: jenkins-caseycui2020.a.caas.e-whisper.com

  4. podVolume Data is not migrated.

latestThis tag was not imported successfully

To import manually, the command is as follows: (1.0.1 is the latest version of ImageStream)

1
oc tag xxl-job-admin:1.0.1 xxl-job-admin:latest

PVC phase Lost problem

If created manually, PVC yaml needs to be adjusted. The PVC before and after adjustment is as follows:

Cluster B original YAML:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
annotations:
pv.kubernetes.io/bind-completed: 'yes'
pv.kubernetes.io/bound-by-controller: 'yes'
volume.beta.kubernetes.io/storage-provisioner: csi.trident.netapp.io
selfLink: /api/v1/namespaces/caseycui2020/persistentvolumeclaims/jenkins
resourceVersion: '77304786'
name: jenkins
uid: ffcabc42-845d-4cdf-8c7c-56e97cb5ea82
creationTimestamp: '2020-10-21T03:05:46Z'
managedFields:
- manager: kube-controller-manager
operation: Update
apiVersion: v1
time: '2020-10-21T03:05:46Z'
fieldsType: FieldsV1
fieldsV1:
'f:status':
'f:phase': {}
- manager: velero-server
operation: Update
apiVersion: v1
time: '2020-10-21T03:05:46Z'
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:annotations':
.: {}
'f:pv.kubernetes.io/bind-completed': {}
'f:pv.kubernetes.io/bound-by-controller': {}
'f:volume.beta.kubernetes.io/storage-provisioner': {}
'f:labels':
.: {}
'f:app': {}
'f:template': {}
'f:template.openshift.io/template-instance-owner': {}
'f:velero.io/backup-name': {}
'f:velero.io/restore-name': {}
'f:spec':
'f:accessModes': {}
'f:resources':
'f:requests':
.: {}
'f:storage': {}
'f:storageClassName': {}
'f:volumeMode': {}
'f:volumeName': {}
namespace: caseycui2020
finalizers:
- kubernetes.io/pvc-protection
labels:
app: jenkins-persistent
template: jenkins-persistent-monitored
template.openshift.io/template-instance-owner: 5a0b28c3-c760-451b-b92f-a781406d9e91
velero.io/backup-name: caseycui2020
velero.io/restore-name: caseycui2020-20201021110424
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
volumeName: pvc-414efafd-8b22-48da-8c20-6025a8e671ca
storageClassName: nas-data
volumeMode: Filesystem
status:
phase: Lost

After adjustment:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: jenkins
namespace: caseycui2020
labels:
app: jenkins-persistent
template: jenkins-persistent-monitored
template.openshift.io/template-instance-owner: 5a0b28c3-c760-451b-b92f-a781406d9e91
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: nas-data
volumeMode: Filesystem

podVolume Data is not migrated

You can migrate manually, with the following command:

1
2
3
4
5
6
7
8
# 登录B集群
# 先把B 集群/opt/prometheus数据拿出来到当前文件夹
oc rsync xxl-job-admin-5-9sgf7:/opt/prometheus .
# 上边rsync命令会创建个prometheus的目录
cd prometheus
# 登录A集群
# 再把数据拷贝进去(拷贝之前得先确保这个pod启动起来) (可以先把`JAVA_OPTS`删掉)
oc rsync ./ xxl-job-admin-2-6k8df:/opt/prometheus/

summary

This article was written earlier, and OpenShift came out with a proprietary migration tool based on OpenShift wrapped in Velero, which can be migrated directly through the tools it provides.

In addition, there are many restrictions on OpenShift clusters, and there are also many resources exclusive to OpenShift, so the difference between actual use and standard K8S is still relatively large, and you need to pay careful attention.

Although the attempt failed, the ideas are still available for reference.

Series of articles

📚️ Reference documentation